
RATAN TATA 
LIBRARY 

DELHI SCHOOL OF ECONOMICS 



D.U.P. No. 1337—1-81-20.00D 


RATAN TATA LIBRARY 

(Delhi University Library System) 

Cl. NO. 7 . 

Ac. No. I^ ^ release for loan 

This book should be returned on or before the date last stamped 
below. An overdue charge of Ton Raise will be charged for each day the 
book is kept overtime. 


Ac. No. 


Date of release for loan 




DESCRIPTIVE AND 


SAMPLING STATISTICS 









Under the editorship 
of 

GARDNER MURPHY 



DESCRIPTIVE AND 


SAMPLING STATISTICS 


BY 

JOHN GRAY PEATMAN 

Associate Dean and 
* Professor of Psychology 

* Thi City College of New York 



HARPER & BROTHERS PUBLISHERS 

NEW YORK AND LONDON 


UNiVSBBITr 0P OKLH 





DESCRIPTIVE AND SAMPLING STATISTICS 
Copyright, 1947, by Harper & Brothers 
Printed in the United States of America 


A-B 

All rights in this book are reserved. 

No part of the book may be reproduced in any 
manner whatsoever without written permission 
except in the case of brief quotations embodied 
in critical articles and reviews. For information 
address Harper & Brothers 



To Lee, Alice, John, and Bill 




PREFACE 


Contents 


XV 

PART I. DESCRIPTIVE STATISTICS 

y/(. INTRODUCTION TO STATISTICS 3 

A. Historical Backj^rouiul 3 

(laiiihh'rs and Kin^. TIk^ Matliomalicians. The Census—Vital 
Slalisti(*s. Ad()lph(‘ Qiietelet Social Scientist. Sir Francis 
Gallon (jleneticist. Correlation. Statistical Prediction Actu¬ 
arial, not Individual. 


B. Descriptive^ vs. Sampling Statistics 9 

The Concept “Statistics’'—Its Various Aleanings. Description 
vs. Sampling. TIk^ Beduction of Data. 

C. The Nature of Statistical Data 1.3 

Non-Variable Data. Variable Data. 1'*h(‘ ''rreatmcMil of Statis¬ 
tical Data. The Alathematical and Logical Impli(‘ations of a 
Variable—A Series. Kxact and Approxiriiate Mc'asurc's. 

2. THE REDUCTION AND ORGANIZATION OF CATEGORICAL DATA V) 

A. Introduction 19 

B. The Classification and Enumeration of Attributes 19 


Dichotomous and Polytomous Classifications of Attributes. 
Classification vs. Division. Rules for Logical Division and 
Classification. Classification of Judgments, Attitudes, and 
Opinions. Classification of Don't Know’s (OK's) in Market 
Research Investigations. The Statistical Frequency. h]nurnera- 
tion vs. Measurement. Stratification—An Opinion Poll. 


C. Methods for Treatment of Original Data 37 

The Hand-Sorting of Statistical Data. Machine Tabulation. 

The Findex System of Coding and Analysis. 

3. THE COMPARISON OF CATEGORICAL DATA: PROPORTIONS, PER¬ 
CENTAGES, RATIOS, INDEX NUMBERS 43 

A. Ratios and Percentages 43 

Proportions. Rounding OIT NumbiTs. 

B. Use of Percentages for Comparing tht‘ Parts of Two or Mori' 

Wholes 19 

V 







vi 


CONTENTS 


C. Ratios and Index Numbers 52 

Per Capita Indices. Ratios as Index Numbers. 

D. Confusion in the Use of Percentages 55 

Confusion in Interpreting a Percentage Increase. A Percentage 
Decrease Can Never Be More Than 100%. Confusion Between 
Percentages and Proportions. Confusion from Large Percen¬ 
tages. Percentages from too Small a Base. Errors in Averaging 
Percentages. 

E. Graphic Methods for the Presentation and Comparison of 


Categorical Data 58 

Bar Graphs. Belt Graphs. Pie Diagrams. Maps. Pictorial 
Charts. 

4. THE CORRELATION OF CATEGORICAL DATA 80 

A. The Cross-Tabulation of Categorical Data 80 

Cross-Tabulation Essential to Correlation. The Correlation 
of Non-Variable Attributes. The Correlation of Polytomous 
Attributes—Market Research Data. 

B. Methods for the Correlation of Categorical Data 90 


Yule’s Coefficient of Association {A) for Dichotomized Non- 
Variable Attributes. The Correlation of Dichotomized Vari¬ 
ables: The Phi Coefficient. The Correlation of Polytomous 


Attributes: The Contingency Coefficient. 

5. THE REDUCTION AND ORGANIZATION OF VARIATE DATA 99 

A. Introduction 99 

B. The Range and Array 99 

The Range as a Comparative Measure. The Array. 

C. The Frequency Distribution 103 

The Class Interval. The Tally. The Frequency Distribution. 

D. The Histogram and the Frequency Polygon 113 

The Histogram. The Frequency Polygon, or Line Graph. 
Comparative Usefulness of the Histogram and Frequency 
Polygon. 

E. The Percentage Frequency Distribution 120 

F. The Cumulative and Percentage Cumulative Frequency Dis¬ 
tribution 121 

The Cumulative Frequency Distribution. The Percentage 
Cumulative Frequency Distribution. Usefulness of Percentage 
Cumulative Graph for Comparing Distributions. 

6. THE CENTILE POINT METHOD FOR VARIATE DATA 127 

A. Centiles and the Description of Variate Data 127 

Gentile Point Values vs. Centile Intervals. Quartiles, Terciles, 



CONTENTS 


vii 


Quintiles, Deciles, and Vigintiles. Comparative Implications 
of Gentile Measures. The Determination of Gentiles. 

B. Gentiles by the Graphic Method 131 

The Gentile Graph. Determining the Score Values of Gentiles 
from a Gentile Graph. 

C. The Computation of Gentile Values 134 

The Location of a Gentile Point. Interpolating the Score Value 
of a Gentile Point. Checking the Computed Gentile Value. 
Comparison of Estimated and Computed Gentile Values. 

D. Gentile Measures 139 

The Median (A Measure of Central Tendency?). The D Range 
—A Measure of Dispersion. The Quartilc Deviation—A 
Measure of Deviation or Variability. The Tercile Deviation— 

A Measure of Deviation or Variability. 

E. The Use of Gentiles for Comparing the Results of Two or More 


Distributions of a Variable 142 

F. The Use of the Gentile Method for Comparing the Results of 
Two or More Variables 146 

V. THE MEAN AND STANDARD DEVIATION 150 

A. The Method of Moments for Variate Data 150 

Basic Symbols. 

B. The Mean 151 


Definition. Method I: The Mean from Unordered Data. 
Method II: The Mean—Long Method with Data Grouped 
into a Frequency Distribution. Method III: The Mean—Short 
Method with Grouped Data. 

C. The Standard Deviation 160 

Definition. Method I: Standard Deviation from Ungrouped 
Data. Method II: Standard Deviation—Long Method with 
Grouped Data. Method III: Standard Deviation—Short 
Method with Grouped Data. Method Ilia: Standard Devia¬ 
tion—Short Method with Ungrouped Data. Sheppard’s Correc¬ 
tion for <T, 

D. The Average Deviation 168 

Definition. Method I: Average Deviation—Ungrouped Data. 
Method II: Average Deviation with Grouped Data. 

E. The Coefficient of Relative Variation 171 

8. COMPARATIVE IMPLICATIONS OF THE NORMAL, BELL-SHAPED CURVE 174 

A. Implications of M and a for Normal, Bell-Shaped Distributions 174 
The Mean as Point of Reference. The Mean as a Fulcrum. The 
Median and Mean. Uni-Modality and the Mode. Bilateral 



viii 


CONTENTS 


Symmetry. Points of Inflection and a, Asyniptotic Character 
of the Normal Curve. The Practical Limits Equal M S.Oa*. 

<7 as the Standard Measure of Variability. Measures as z Scores. 
z Scores Signify Relative Position in a Series. Centile Implica¬ 
tions of Standard Measures. Summary of Commonly Used 
Measures of Dispersion About the Mean. The Normal Proba¬ 
bility Curve. The Formula for the Normal Curve. Relationship 
Between Various Measures of Variability in a Normal Dis¬ 
tribution. 

B. The Use of z Scores and Standard Scores for Comparative 
Purposes 181 

Standard Scores. Standard Score Norms. The Standard Score 
Profile Chart or Psychograph. 

9. THE PRODUCT-MOMENT METHOD FOR THE CORRELATION OF VARIATES 195 

A. The Linear Correlation of Bi-Variates 195 

Pearson’s Product-Moment r. The Cross-Tabulation of Bi- 
Variate Data. The Scattergram of Bi-Variate Data. The 
Assumption of Linear Correlation. Plotting the Bi-Variate 
Data of a Scattergi'am. The Correlational Frequency: Paired 
Associates. The Correlation Chart. 

B. Estimation of Product-Moment r 208 

Fitting Linear Regression Lines to Bi-Variate Distributions. 

The z Score Correlation Chart. The Regression Line for Zy on 
Zx. Estimating r. The Regression Equation of Zy on Zx. The 
Regression Equation in Descriptive Statistics. The Regression 
of z* on Zy. The Regrevssion Equation for Zx on Zy. Tlie Regres¬ 
sion Coefficients. Regression Equations Expressed in Terms of 
X and y. Standard Formula for r. 

C. Computation of Product-Moimint r 225 

Summary of Mathematical Implications of r. Various Methods 
for the Computation of r. 

D. Method I: Product-Moment r from Ungrouped Data (Long 

Method) 226 

Order of Operations for Method I. Shortcomings of Method I. 

E. Method II: Product-Moment r from Grouped Data (Short 

Method) 229 

The Frequency Distributions of Each Variable from the Corre¬ 
lation Chart. The Standard Deviations of Each Variable from 
the Correlation Chart. The Product Deviations. Ratio for r. 
Checking 2(ar'y')‘ Means and Standard Deviations from the 
Correlation Chart. 



CONTENTS 

F. Method III: Product-Moment r from Ungrouped Data (Machine 
Method) 

Machine Computation. The Guessed Means Taken as Equal 
to Zero. The Formula for r (Method III). Inter-Correlation 
Coefficients. Work Sheet for Original Data and Computation 
of Means, Squares, and Cross-Products (Table 9:3). Computa¬ 
tion of Standard Deviations of All Variables (Table 9:4). 
Computation of the Mean of the Product Deviations of Each 
Bi-Variate Distribution (Table 9:5). Computation of the Cor¬ 
relation Coefficients (Table 9:6). 

G. Other Methods for the Computation of r 

The Method of Sums for r. The Mtithod of Diffi^rences for r. 

10. SPECIAL METHODS FOR THE LINEAR CORRELATION OF VARIABLES 253 

A. Correlation of Ranks 253 

Purpose of the Method. Spearman’s Rank-Difference Method. 

The Relation of r to Rho. 

B. Serial Correlation 258 

Biserial Correlation. Point-Biserial Correlation. Triserial, 
Quadriserial, and Quintiserial r. 

C. Tetrachoric Correlation 275 

Purpose of the Method. The Computation of Tetrachoric r {r^, 
l^stimatingTetrachoric Correlation with Thurstone’s Diagrams. 


236 


247 


PART II. SAMPLING AND ANALYTICAL STATISTICS 


11. SAMPLES AND SAMPLING TECHNIQUES 283 

A. Introduction 283 

Census vs. Sample. Sampling Is a Research Technique. 

B. Statistical Populations or Universes 288 

The Statistical Universe. Finite and Infinite Populations. 
Actual vs. Hypothetical Universes. 

C. Samples and the Techniques of Sampling 290 

Representative Samples, Biased Samples. 

D. Random Samples—^The Principle of Randomization 294 

Definition. The Technique of Random Sampling. The Sampling 
Unit. 

E. Stratified-Random Sampling 299 

Definition. Stratifying Factors. The Technique of Stratifica¬ 
tion. The Inter-Relation of Stratifying Factors. Sub-Universes 
in Stratified-Random Sampling. Internal Controls in Sampling. 



X 


CONTENTS 


Areal Sampling. The Technique of the Master Sample. The 
Random-Point Method of Sampling. The Stratified-Quota 
Method of Sampling. Chief Source of Error in Stratified 
Sampling. The “Representativeness” of Stratified Samples. 

F. Some Further Considerations About Sampling 313 

Precision and Adequacy in Sampling. The Character of Samples 
vs. the Size of Samples. Accidental Samples. Restricted Uni¬ 
verses and Partial Investigations. The Analysis of Intra-Group 
Differences in Sampling. Sampling in the Experimental Method 
of Equated Groups. Experimental Method with Random 
Samples. 

G. Some Terminological Distinctions for Sampling and Analytical 322 
Statistics 

Parameters and “True Measures.” Statistics. Symbols for the 
Differentiation of Parameters and Statistics. Sampling Dis¬ 
tributions. Small Sample Theory vs. Large Sample Theory. 

The Standard Error of a Statistic. Statistical Hypotheses. 

The Probable Error of a Statistic. Sampling Error and Error of 
Measurement. 

12. PROBABILITY AND STATISTICAL INFERENCE 328 

A. The Statistical Concept of Probability 328 

Definition of Probability. A Single Event Has No P Value— 

The Concept of Likelihood. Strict Causality vs. Statistical 
Relations. 

B. The Binomial Distribution and the Normal Probability Curve 331 

Normal Sampling Distributions. Binomial for Samples of 
Na = 2. The Product and Addition Theorems of Probability. 
Binomial for Samples of Na = 3. Binomials for Larger Samples. 

The Expansion of the Binomial for the Normal Probability 
Curve. The Probability of a Result Derived from the Normal 
Probability Distribution. A Test of Significance (T). The 
Evaluation of the Test of Significance. The Distribution of 
Frequencies in the Normal Probability Distribution. 

C. Small Sample Theory—Leptokurtic Sampling Distributions 347 

Kurtosis (Ku). The t Statistic. When Is a Sample Small? 

D. Skewed Sampling Distributions and Normal Probability 349 

The Binomial When p 9 ^ q. 

E. The Precision (Reliability) of Seunple Results and the Size of / 

Samples n^53 

Precision Meeisured by the Standard Error. Precision Gener¬ 
ally a Function of VTV,. Precision and Reliability. 



CONTENTS 


XI 


13. HYPOTHESES AND TESTS OF SIGNIFICANCE 360 

A. Likelihood and Confidence Criteria 360 

Postulation of Parameters. Hypotheses Give Direction and 
Meaning to Research. The Probability Estimate. The Test of 
Significance and the Test Ratio (T). Likelihood and Confidence 
Criteria. Confidence Criteria in Terms of T Ratios. 

B. Confidence Limits: Testing a Continuum of Hypotheses 368 

Many Statistical Hypotheses Can Be Tested. Fiducial Limits 
and Confidence Limits. The Reliability of a Statistic. 

C. Summary of Steps for the Testing of Hypotheses 371 

D. Tests of Significance for Some Commonly Used Statistics 373 

Percentages. Proportions. Frequencies. The Arithmetic 
Mean. Test Scores and Other Measures. Standard Deviations. 
Average Deviation. Centiles. Product-Moment Correlation 
Coefficients. Other Correlation Coefficients. Skewness and 
Kurtosis of Distributions. 

E. The Probable Error and Tests of Significance 393 

F. Tests of Significance for Small Sampl(\s 397 

Fisher’s t Statistic. Probability Values for L 

14. TESTS OF SIGNIFICANCE FOR DIFFERENCES BETWEEN STATISTICS 401 

A. The Standard Error of a Difference Between Any Two Statistics 401 

Standard Error of a Difference for Independent Samples. 

B. Tests of Significance for a Difference Betwetjn Any Two Statistics 403 

Confidence Criteria for the Significance of a Difference. 

C. A Difference Between Percentages (or Proportions) Derived from 


Non-Correlated Samples 404 

D. A Difference Between Percentages Derived from Correlated 

Samples 407 

E. A Difference Between Arithmetic Means Derived from Non- 

Correlated Samples 409 

Fisher’s Null Hypotheses for Differences. 

F. A Mean Difference Between Correlated Samples 412 

Effect of Heterogeneity of “Matched Samples.” 

G. A Difference Between Standard Deviations 416 

Combining the Results of Several Groups for a Test of Sig¬ 
nificance. 

H. A Difference Between Coefficients of Relative Variation 418 

I. A Difference Between Product-Moment Coefficients of Correlation 419 



xii 


CONTENTS 


15. CHI-SQUARE AND TESTS OF SIGNIFICANCE 424 

A. Chi-Square for the Distribution of Non-Variable and Variable 

Attributes 425 

Calculation of Chi-Square. A Chi-Square Test of Significance of 
Consumers’ Brand Preferences (a Dichotomy). The Probability 
of Chi-Square. Degrees of Freedom (d./.). Chi-Square as a Test 
of Significance. A Chi-Square Test of Significance for a Tri¬ 
chotomy. A Chi-Square Test of Significance for the Distribu¬ 
tion of a Variate. 

B. Chi-Square Tests of Significance for the Independence of Two 

Attributes 437 

Chi-Square Tests of Significance for Correlation Between Di¬ 
chotomized Attributes. Pearson’s Short-Cut Computation of 
X® for 2 by 2 Cross-Tabulations. Chi-Square Test of Significance 
for Correlation Between Attributtis with More Than Two 
Categories. Contingency Coefficient. Relation Between 
and <t>, 

16. THE PREDICTIVE MEANING OF CORREUTION 445 

A. Making the Prediction 447 

Predictions on a Correlation Matrix. 

B. The Accuracy or Efficiency of Predictions 451 

The Standard Error of Estimate. The Interpretation of the 
Error of Estimate. Graphic Representation of the Accuracy of 
Predictive Estimates. Tlie Index of Predictive Efficiency (FJ), 
Standard Error of Estimate for the Mean Tests of 

Significance for Predictive Estimates. Summary 

17. CORRELATION METHODS FOR THE EVALUATION OF PSYCHO¬ 
LOGICAL TESTS 464 

A. The Reliability and Validity of a Barometer and of a Psycho¬ 
logical Test 465 

The Barometer. The Psychological Test. 

B. The Determination of Test Reliability 470 

Test Reliability by the Method of Test-Retest (rxx). Test 
Reliability by the Method of Alternate Forms (rixO* Test 
Reliability by the Split-Half Method Test Reliability by 
the Method of Item-Intercorrelation. Effect of Range of 
Ability on Test Reliability. 

C. The Determination of Test Validity 478 

Operational Validity. Functional Validity. Validity Criteria— 
Abilities vs. Aptitudes. Effect of Range of Ability on Test 
Validity. 



CONTENTS 


xiii 


D. Test Item Analysis 481 

Item Reliability and Validity. Biserial and Fourfold Correla¬ 
tion Techniques. 

E. Multiple Correlation (R) 482 

Predicting Academic Success from Two Variables. Predicting 
Clerical Efficiency from Two Variables. The Multiple Regres¬ 
sion Equation and tlu’; Standard Error of Estimate of R. 

F. Partial Correlation 485 

Partial Correlation with Scholastic Aptitude Held Constant. 
Partial Correlation with Age Held Constant. Spurious Cor¬ 
relation. 

18. CLUSTER AND FACTOR ANALYSIS 489 

A. Theory of th(j Organization of Human Traits 489 


The Coefficient of Determination (r^). Spearman’s Two-Factor 
Theory. Multiple-Fad or ''Fheories. Sampling Theory and 
Cluster Analysis. 

B. Methods of Factor Analysis 492 

Tryon’s Method of Correlation Profile Analysis. Cluster 
Analysis of Body Measurement,s. Cluste^r Analysis of Psy¬ 
chological Variables. Some Ceneral Implications of Factor 
Analysis. 


APPENDIX A. Bibliography of Statistical Tables and Nomographs, Periodi¬ 
cal Literature, and Chief References in Mathematical and Advanced 
Statistics. 505 

APPENDIX B. Tables of Statistical Functions. 507 

I. Areas and Ordinates of Ihe Normal Probability Curve 508 

lA. Ordinate Values of the? Normal Curve Expressed as Proportions 

of the Ordinate at the Mean 511 

II. Probability Values for T of Normal Sampling Distributions of 

Large Sample Tiieory 512 

III. Distribution of t for Small Samples 514 

IV. Distribution of Chi-Square 515 

V. Values of Functions of r 516 

VI. Values of Fisher’s z Function for Values of r 518 

VII. Values of Proportions, p and q 519 



xiv 


CONTENTS 


APPENDIX C. Tables of Squares, Square Roots, Reciprocals, and Random 
Numbers 521 

I. Squares, Square Roots, and Reciprocals of Integers from 1 to 
1000 522 

II. A Table of Random Numbers 543 

GLOSSARY OF STATISTICAL SYMBOLS 547 

GLOSSARY OF PRINCIPAL STATISTICAL FORMUUS 551 

INDEX 565 



Preface 


Statistical method is a fundamental and necessary tool for research workers 
in the social and biological sciences. It needs no more justification for its 
existence in these fields than does applied mathematics in the fields of engineer¬ 
ing and the physical sciences. The methods of statistics are methods of applied 
mathematics; they are essential working tools for social and biological scien¬ 
tists because they provide the necessary scientific methodology for obta ining , 
o rganizing , s ummarizing , and a nalyzing r esearch data. 

Statistical method is not presented in this book as a discipline to be studied 
for its own sake; such an approach would be essentially mathematical. Rather, 
the emphasis is on its presentation as a useful and necessary tool for research 
problems in psychology and the closely related fields of education, cultural 
anthropology, and sociology; and considerable attention has been given to the 
use of statistics in public opinion and market research. 

The presentation of statistical method as a research tool can be treated in 
various ways. Thus interest can be focused solely on the methods of computa¬ 
tion, with the reasons for the methods, the logic of their application, and their 
value for particular problems left to the student’s imagination (or the instruc¬ 
tor’s) ; on the other hand, computational methods may be given practically no 
emphasis. We have attempted a balanced presentation that will teach the 
student not only how to compute a statistical measure but when to use a par¬ 
ticular technique and how to interpret a result. Some mathematicians may feel 
that no student can attain a satisfactory grasp of statistics without a knowl¬ 
edge of the mathematical bases and their implications. Certainly there is no 
question but that this knowledge is both important and helpful. However, the 
student who is interested primarily in a social science and only secondarily in 
statistics—as a means to an end, a tool—can obtain a sound working knowl¬ 
edge of the subject without, for example, being able to differentiate the normal 
probability distribution by means of the calculus. Such a student has funda¬ 
mentally a fourfold need: (1) an appreciation of the usefulness of statistical 
method in his field; (2) an understanding of the logic underlying its application; 
(3) the ability to select the most relevant statistical technique and to make the 
necessary computations with a minimum of error; and (4) the ability to 
interpret a statistical result in a way justified by the character of the data. - 

Descriptive and Sampling Statistics is designed as a text for an introductory 
one-year course for either undergraduate or graduate students. Each of the 
two parts into which the book is organized—Descriptive Statistics, and 


XV 







xvi PREFACE 

Sampling and Analytical Statistics—contains sufficient material for a semester 
course of 45 to 60 hours. Most of the various statistical methods are developed 
by presenting both the type of problem for which each method is required, 
and the logical basis for the statistical solution. Part II contains a chapter onj 
probability, presented as a preliminary to the development of Tests of Sig-j» 
nificance, and also a chapter on sampling methods, b(icause methods of samJ 
pling are as integral to sampling and analytical statistics as is tlie manner of 
treating the data derived from the samplers. Contrary to the belief held by some 
lay persons that conclusions based on statistics are dubious or useless, adequate 
methods of sampling and measurement make it possible to draw conclusions 
that are as reliable and useful as those based on other scientific methods. Only - 
in the hands of the inept or the fraud is there any justification for the popular 
saying that there are three kinds of lies—defensive lies, base lies, and statistics. 

Acknowledgments to various authors and publishers have been made 
through the book. I am especially indebted, however, to Professor R. A. Fisher 
and to Messrs. Oliver and Boyd, Ltd., of Edinburgh for permission to reprint 
Table Nos. Ill and IV of Appendix B from their book. Statistical Tables for 
Biological, Agricultural and Medical Research, and for the adaptation of Table 
VI of Appendix B from this same work. For permission to reproduce various 
charts, I am also indebted to the Editors of Broadcasting Magazine and to 
Radio Station WOR, New York City (Fig. 6:4); to the Editors of Fortune 
Magazine (Fig. 3:19); to the Institute of Public Administration, New York 
City (Figs. 3:7, 3:8, 3:11, 3:12, and 3:13); to the Public Affairs Committee, 
Inc., New York City (Figs. 3:14, 3:16, and 3:18); and to the New York Times 
Magazine and the Pictograph Corporation of New York (Fig. 3:17). 

I wish also to take this opportunity to acknowledge my indebtedness to 
Frederick E. Croxton, my first teacher in statistics, who inspired a lasting 
interest in the subject; to Gardner Murphy, Editor of this Series, for his 
many helpful and constructive suggestions; to Harriet Clernenson, Clare 
Luhman, Mary McDonald, Georgette Schneer, Madeline M. Sherwood, and 
Jean Brown Trapnell for their able and scrupulous assistance in the prepara¬ 
tion of my original manuscript; and to Dorothy Thompson, Production Editor 
of the College Department of my publishers, for her competent and careful 
work in the final preparation of my manuscript for the press. 

To my students I wish to acknowledge a great indebtedness for their 
curiosity and stimulation which have long been a great satisfaction to me in 
the teaching of statistics. 


John Gray Peatman 










CHAPTER 1 


Introduction to Statistics 


A. HISTORICAL BACKGROUND* 

Statistics is a form of applied mathematics. It is a logical tool used in all the 
sciences and employed by all modern cultures. It is especially a tool of the 
biological and social sciences, a tool whose development has paralleled the 
practical demands of man’s needs in a diverse and complex world. 

Gamblers and Kings 

Statistics had its beginnings many generations ago as a result of the inter¬ 
ests and needs of gamblers and kings. The gamblers wished to develop systems 
that would improve their skill at cards and dice. Kings wished to know more 
about their subject so as to work out more efficient taxing systems. Out of 
the interests and needs of gamblers came the foundation of our modern 
theory of probability, a theory basic to sampling statistics. Out of the interests 
and needs of kings emerged vital and social statistics, statistics as a descriptive 
method for enumerating and classifying hundreds and thousands of classes 
of useful data. 


The Mathematicians 

After the gamblers and kings came the mathematicians. In 1657 there 
appeared a brief treatment by Christian Huygens, the great Dutch mathe¬ 
matician and physicist, of the chances of winning at certain card and dice 
games. Three years earlier Pascal and Fermat had had their famous corre¬ 
spondence, in which they established the fundamental principles of proba¬ 
bility. A little later Jacques Bernoulli, the Swiss mathematician, wrote the 
first book on the subject of probability. It was published in 1713, after his 
death, by his nephew, Nicolas Bernoulli. The work is an historical landmark, 
especially because of its emphasis on the practical value of the theory of 
probability for social problems. But Jacques Bernoulli’s untimely death cut 
short the immediate development of many practical possibilities of statistics 
in social affairs. Such development waited another century, until the work of 
the Belgian, Adolphe Quetelet. 


* Cf. H. M. Walker, Studies in the History of Statistical Method^ Williams & Wilkins, 
Baltimore, 1929. 


3 




4 


INTRODUCTION TO STATISTICS 


In the meantime, the theoretical development of statistics centered about 
the concept of probability initiated, as we have indicated, by Pascal and 
Fermat, and Jacques Bernoulli. In 1733, de Moivre gave the first mathemati¬ 
cal formulation of the normal probability curve (the curve of error), but little 
attention was paid to it at the time. De Moivre attempted to remove the 
stigma of gambling from the problem of probability and to give the theory a 
divine flavor by maintaining: “And thus in all cases it will be found, that 
although chance produces irregularities, still the Odds will be infinitely great, 
that in process of Time, those Irregularities will bear no proportion to the 
recurrency of that Order which naturally results from Original Design.” * 

It was not, however, until toward the end of the eighteenth century and 
the beginning of the nineteenth that the theoretical development of statistics 
got under way as a broad and continuous enterprise. It was with the work 
of the great European mathematicians, Laplace and Gauss, and of the physi¬ 
cists and astronomers that the scientific foundations were laid for the theory 
of probability and the measurement of errors of observation. Gauss, “the 
Prince of Mathematicians,” f was especially concerned with the practical as 
well as the theoretical problems of astronomical measurement, and the normal 
curve of error was developed for the variable results of observation with the 
mean of a series of observed values taken as the most probable value of the 
measure sought. 

In this work of the mathematical astronomers it is evident that theoretical 
statistics was developing in conjunction with some empirical problems of 
measurement. However, the broad foundations of descriptive statistics (which 
Quetelet later integrated with the theoretical) for the study of social phe¬ 
nomena were established by government officials and political economists. 

The Census—^Vital Statistics 

We have seen that kings had long been interested in enumerating those > 
of their subjects who could pay taxes. They had also long been interested in 
the number of subjects who could render military service. The registering 
of baptisms, marriages, and deaths was begun in a few places in Europe during 
the fourteenth and fifteenth centuries. Such data formed the basis for the 
beginnings of descriptive statistics, and by the seventeenth century census 
taking had its systematic start. According to Godfrey, J the first census of 
modern times to be conducted under that name was taken in Canada in 1666. 
The data reported filled 154 pages and included facts about the population! 
such as sex, family and conjugal status, age, profession, and trade. More/ 

• Cf. H. M. Walker, Studies in the History of Statistical Method^ Williams & Wilkins, 
Baltimore, 1929, p. 17. 

t Cf. E. P. Bell, Men of Mathematics^ Simon «& Schuster, New York, 19.37, chap. 14. 

X E. H. Godfrey, Section on Canada, in John Keren (ed.). The History of Statistics; 
Their Development and Progress in Many Countries^ Macmillan, New York, 1918, pp. 179- 
198. 



HISTORICAL BACKGROUND 


5 


recently, however, Dr. Carlos Castehada, Latin-American authority at the 
University of Texas, has reported that the first census on the North American 
continent was conducted by the alcaldias mayores of New Spain between 1570 
and 1580 at the command of King Philip II of Spain.* Philip wanted to 
know how many people there were, the family income, members per family, 
the amount of taxes they paid, and on what and with what they paid their 
taxes. Altogether there were 150 questions for each family to answer. 

The end of the seventeenth century saw the publication of mortality tables ^ 
by the English astronomer, Halley, in 1693. Annuity tables for insurance 
societies made a marked empirical development in the eighteenth century 
because of the vital statistics which had been collected by that time. The 
revolutions in America and France further stimulated the interest in data 
about the masses of population. Our Articles of Confederation provided for 
a triennial census, but this was changed to a decennial basis when the Con¬ 
stitution was adopted; and 1790 saw the first official census of the newly 
formed United States of America. 


Adolphe Quetelet (1796-1874)—^Social Scientist 



It was Quetelet who developed statistical method as a scientific research ^ 
tool in the study of man and the social sciences. Quetelet was a university 
teacher, mathematician, astronomer, and anthropometrist, as well as his 
country’s supervisor of official statistics and hence responsible for the first 
nation-wide census. It was Quetelet who brought together the theoretical and 
empirical foundations of statistics, integrating and developing them for the 
investigation of social phenomena. He combined a mathematical interest in 
the theory of probability with a passion for the collection of data about 
pt^ople. Time and again, during the nineteenth century, he emphasized that 
tlie basic tecliniques of statistical method are the same whether we are study¬ 
ing the stars or man, the weather or morals. It wa^Quetelet who developed 
the concept of fhe average man—Vkomme moym —insisting that in the sphere 
of human actmties all is not individual and jmmeasurable. In 1831 he re¬ 
ported a study on tendencies to crime atjlifferent ages, in which he analyzed 
the role of such factors as sex, education, and climate on criminal tendency. 
Just as we are often startled by predictions about the number of deaths from 
accidents to be expected on the Fourth of July or for a given period from 
automobile traffic, so Q ^telet w as impressed by the relative constancy of ^e_ 
number of crimes fro m year to year: “Thus we pass from one year to another 
with the sad perspective of seeing the same crimes reproduced in the same 
order and calling down the same punishments in the same proportions. Sad 
condition of Humanity! . . . We might enumerate in advance how many 
individuals will stain their hands in the blood of their fellows, how many will 


* C. D. Gastenada, in the New York Herald Tribune, July 7, 1940; also direct corre¬ 
spondence. 




6 


INTRODUCTION TO STATISTICS 


be poisoners; almost we can enumerate in advance the births and deaths 
that should occur. There is a budget which we pay with frightful regularity; 
it is that of prisons, chains and the scaffold.” * 

Quetelet was criticized as a materialist by many of his contemporaries be¬ 
cause he dared to suggest that the moral worth of a man might be inferred 
from measurements of his actions, that the intellectual vitality of a man 
might be deduced from what he produced. He was confident that the mental 
and moral traits of man could be measured and that, when measured, the 
distributions of such traits would be shown to conform to the so-called normal 
law. T he normal probability curve, which is illustrated in Fig. 1:1, came 
practically to be deified—and no wonder. As large samples of data of various 
characters of men, of biological and social phenomena, came to be measured, 
the distributions were often found to approach the form of this curve. 

Sir Francis Galton (1822-1911)—Geneticist 

After Quetelet, but contemporary with him as a statistician for a genera¬ 
tion, came Sir Francis Galton. Like Quetelet, Galton also made extensive 
use of the normal probability curve in the description of biological and social 
phenomena. Like Quetelet, Gajtqn saw in statistical method the means of 
dfecovering regularity and lawfulness in phenomena which otherwise, by 
th^ diversity and complexity, seemed individual and unique. Galton sug¬ 
gested the use of the normal curve in the assigning of grades, or class marks, 
in the schoolroom. Like Quetelet, Galton had a great passion for observation, 
for recording data and analyzing them by the methods of statistics, many 
of which he himself developed as the need arose. It was Galton who dis¬ 
covered the method of s tatistica l correl ation , a discovery made in connection 
with the need for analysis in his studies of the inheritance of traits. It is no 
exaggeration to describe this discovery of Galton’s as one of the greatest 
contributions ever made to the empirical development of the biological and 
social sciences. 


Correlation 

The need for the technique of correlation is aptly illustrated by some of 
Bowditch’s problems which he was unable to answer adequately, as he him¬ 
self recognized. With the object of improving school application in growing 
children, the Massachusetts Board of Health sponsored the study by Bow- 
ditch, reported in 1877. f Descriptive statistics of nearly 25,000 children were 
obtained, including not only bodily measurements and age, but also nation¬ 
ality, place of birth, and occupation of parents. Bowditch wished to analyze 

* F. H. Hankins, Adolphe Quetelet as Statistician, Columbia University Studies in History, 
Economics and Public Law, No. 84, New York, 1908. 

t A. P. Bowditch, “The Growth of Children,” Report of the Board of Health of Massa¬ 
chusetts, 1877; reprinted in Bowditch’s Papers on Anthropometry, Boston, 1894. 



HISTORICAL BACKGROUND 


7 


this tremendous mass of data for such relations as might be relevant to the 
original problems of the inquiry. He wanted to know, for example, what rela¬ 
tion there was between the height and weight of the children. He saw that 
there was a relationship, but the technique of correlation was not yet avail¬ 
able with which to formulate a determinate answer regarding the degree or 
character of the relationship. 

That there is an empirical basis for a possible relationship is obvious since 
height and weight are both attributes or traits of individuals. The real ques¬ 
tion, however, does not relate to the individual case. It is an actuarial or 


Fig. 1:1. The Normal Probability Curve 



The abscissa (horizontal) axis represents the .scale of measures or scores of a 
variabitj attribute or trait. The onlinate (vertical) axis represents the frequencies 
of the distribution. The higher the curve at any point, the greater the number of 
frequencies or instancies for the measures at that point. The point of grciatest con¬ 
centration of frequencies is in the center of the distribution, at A/, the mean. 

group question. We know that some people are likely to be tall, some short, 
seme heavy, some light. Persons, even of the same age, thus vary in height 
and weight. Height and weight are therefore called var iables or variates, 
Quetelet and others established the fact that there is a very real tendency 
for a large random sample of persons of a given age to have weights or heights 
which, when systematically organized into a series according to size, form a 
distribution which is similar to that of the normal probability curve (see 
Fig. 1:1). 

The questi on o f relationship between two variable attributes like height 
and weigh t is whether in dividuals of average hei^t flu^ e also of av e rage weight ; 
whether very tall individuals are also very heavy; whether very shqrt m- 




8 


INTRODUCTION TO STATISTICS 


dividuals are also very light. In other words, the question is whether weights 
and heists, when paired according to the persons from whom they are 
obtained, vary together in any systematic way. This is the problem of co- 
variation of correlation, and it is complicated by the fact that rarely, if ever, 
do the measured attributes of biological or social phenomena exhibit perfect 
or complete correlation. Persons who weigh, say, 160 pounds do not all have 
the same height; rather, they vary in height. Similarly, persons of a given 
height, say 6 feet, vary in weight. The statistical problem here is one of 
determining the form and degree of any tendency for weight and height t^ 
var y togetE er. The details^ofthe statistical technique of correlation demanded 
by this kind of problem, the problem of possible co-variation, will be con¬ 
sidered in Chapter 9. Here we wish only to emphasize that the technique dis¬ 
covered by Galton has been indispensable to the modern development of the 
biological and social sciences. 

It is by the statistical technique of correlation that we are today able by 
comparatively simple methods to investigate relations between the attributes 
of individuals, or of organisms generally, as well as relations between the 
attributes of other kinds of natural and social phenomena. What is the nature 
of the relation, if any, between the I.Q.’s and school grades of children, be¬ 
tween the tested achievements of parents and their offspring, between the 
manual abilities of siblings? Is there any relation between temperature and 
plant growth, between neighborhood status and delinquency, between the 
protein content and proportion of vitreous kernels in wheat grains? Although 
methods of investigating such questions as these are sometimes complicated, 

I the method of correlation itself remains a most powerful tool for the study 
of possible relations among the variable attributes of natural and social 
phenomena. It is again to be em phasized , however, that this method, as well 
as statistics generally, is for the study of gro up phenomena—o f masses of 
i nstances . Inferences which can be made legitimately from statistical results 
are about the j^up^^not about the individual instance. Descriptively, such 
results give us information about the group as a whole. Analytically, such 
results may often be used for predicting what may happen in th e long run or 
on the average, but not in the individuaLcase^ ~ 

Statistical Prediction Actuarial, Not Individual 

We say that the chances are even that a tossed coin will land heads or tails. 
We mean that in the long run a series of such tosses should give half heads 
and half tails. What happens in the given, individual toss is strictly de¬ 
termined, although we are unable to ascertain the determining conditions 
so as to predict which side of the coin will lie uppermost. Our ignorance of 
the many factors operating in the determination of the result is such, and 
our knowledge of what happens in the long run for a fair coin is such, that 
we say the chances are even, or fifty-fifty, that the coin will land heads or 
tails. This is thus a verbal, somewhat metaphorical expression of our ignorance 
about whal will happen in the individual instance. Similarly, we say that the 



DESCRIPTIVE VS. SAMPUNG STATISTICS 


9 


chances are about even that a child to be bom will be a boy or a girl. Again, 
the metaphor is based (1) on our ignorance of the determining factors in the 
given, individual instance, and (2) on the empirical facts of vital statistics 
which have revealed for thousands of births that the ratio of boys to girls 
is about 51 to 49. 

These two examples should serve to illustrate the actuarial or group charac¬ 
ter of statistics. What is true for the proportion of heads and tails in coin 
tossing, and of the sex ratio of births in vital statistics, is also true for all 
statistical inference, in that predictions are actuarial and not individual. It 
is well established in psychological and educational measurement, for example, 
that there exists a real correlative relationship between the academic attain¬ 
ments and intelligence test achievement of the school population in our 
culture. Given a particular I.Q. score, say 70, obtained under optimum con¬ 
ditions of measurement, we can predict that school children with such an 
I.Q. will, on the average, be below average in their academic attainments. 
That this is an actuarial or group inference should be obvious; nevertheless, 
such a prediction is sometimes made for the individual child who is, after 
all, either below average or not, in his academic attainments. And what he 
will continue to do in his school work can be effectively and logically pre¬ 
dicted with confidence only as a result of studying him as the psychological 
individual that he is. In dealing with the individual child, the psychologist 
finds it useful and valid to draw upon his fund of statistical or actuarial 
experience and information so long as he continues to focus his analytical 
attention on the unique totality of the particular child.* 

That a child has an I.Q. of 70 is useful information so far as the psycholo¬ 
gist determines as precisely as possible what the actual intelligence test 
performance means for that particular child. In fact, the competent psy¬ 
chological investigator uses an intelligence test chiefly for such a purpose, 
for the light which the child’s performance may throw on his total personality. 
In individual diagnosis and prognosis, the calculation of the I.Q. score itself 
is incidental to this fundamental purpose. 

We see, then, that the data and methods of statistics are for the study of 
group or mass phenomena. And statistical inferences are actuarial in charac¬ 
ter, i.e., they are inferences about what happens or may happen in the long^ 
run, or on the average. 

B. DESCRIPTIVE VS. SAMPLING STATISTICS 

The Concept “Statistics”—Its Various Meanings 

We have been using the term statistics mainly to refer to a method. This 
is because we are primarily concerned in this book with statistics as a scien¬ 
tific method of description and analysis. However, it is well to note that the 

* Cf. in this rej^ard the comments on prediction by G. W. Allport in his presidential 
address to the American Psycholo#?ical Assoc?iation, \9‘.\9: “The Psychologist’s Frame 
of Iteference,’’ Psycliolof/icat Bulletin, 37:1-28, 1910, especially pp. 16-18. 



10 


INTRODUCTION TO STATISTICS 


word statistic s is also used to denote the data ot i nform ati on about po pula¬ 
tions , about biological and social phenomena, t hat ca n be measured or enu- 
merated. Although this latter use of the term has been suggested in the 
preceding pages, we wish specifically to differentiate sMistics as informa- 
tion from statistics as rriethod. 

Statistics as information represents perhaps the most general use of the 
concept. Today there are literally thousands of publications presenting 
statistical information of various kinds: vital statistics, statistics of health 
and medical care, statistics of education, of social security and of labor, 
statistics of crime, statistics of governmental finance, statistics of agriculture, 
manufactures, minerals, of housing and building construction, of wholesale 
and retail trade, of public utilities, of money and banking, of security markets 
and corporations, statistics of international trade, of business activity, of 
commodity prices, of consumption, and of national income and wealth. Al¬ 
though we are not directly concerned with statistics as information, the 
student of psychology, anthropology, sociology, or education should be 
familiar with sources of statistical information relevant to his field of research. 
A short bibliography of source material is appended to serve this purpose 
(see Appendix A). 

Statistic vs. Parameter Values 

Another distinction in the use of the concept statistics arises in the study 
and analysis of populations by sampling methods. Any summary numerical 
values obtained from s amp les of data, such as measures of an average, of 
deviational tendency, of correlation, etc., are characterized as sta^tics. Such 
fstatistics are contrasted with pa rameter valu es, which are these same types 
of measures but are for a statistical popu lation a s a whole, rather than for 
only a sample of the population. 

Statistical Method vs. Statistical Inference 

A distinction is also sometimes made between statistical method and 
statistical inference. This is, however, a somewhat ambiguous and unneces¬ 
sary distinction, since statistical inference is integral to statistics as a method. 

Description vs. Sampling 

A more useful distinction can be made with respect to the nature of statis¬ 
tical method itself, viz., the methods of descriptive statistics, on the one hand, 
and those of analytical or sampling statistics, on the other. Since this dis¬ 
tinction has fundamental and important implications for statistical method 
in scientific research, and since this book is organized on the basis of the 
contrast, we shall describe the difference between descriptive and sampling 
statistics at this point and see more fully the implications of the distinction 
as we proceed. 




DESCRIPTIVE VS. SAMPLING STATISTICS 


11 


The fundamental distinction between descriptive and sampling statistics 
is essentially as follows: In sampling statistics we study populations in terms 
of the data of samples. In other words, the data of a part are used as the 
basis for investigating or studying the whole. In descriptive statistics, on the 
other hand, no distinction between part and whole is made; the data obtained 
in a study are treated as if they constitute a whole. A cen.9u^^j u‘ac t eristically 
n^thods of descriptive statistics, since, by definition, a 
census is a set of observations or measurements made for all members of a 
group or population. A sample^ by definition, characteristically presents data, 
for the problems and methods of analytical statistics. In both cases, however, 
the initial task in the statistical treatment of results is the reduction of data. 


The Reduction of Data 

One of the fundamental purposes served by statistical method is the reduc¬ 
tion of data. What, then, do we mean by the reduction of data? It consists 
in the o rganization and summarization of data into for ms that can be readily 
perceived and understood. Consider the schedule for information or data 
sliown in Fig. 1:2, part of the school record for a child. 


Case No._ 

Name. 

(Surnamt*) 

Address.. 

Birth date. 

Father’s name- 

Mother’s name. 

Brothers--- 

Mental Age. 

Examiner. 


Fig. 1:2. A Schedule 

Date of Record 
_ Sex 

(First) 

_ School 

Place.-.Grade 

_ Occupation 

_Occupation 

_Sisters_ 

I.Q. .. Test- 

.Date 


Educational Achievement Record: 

Area Rating Test 


Date 


Whether or not data are obtained in conjunction with the plan of an 
experiment or in conjunction with the plan of an educational system (or other 
agency) for maintaining relevant records, it should be obvious that a sys¬ 
tematic schedule for recording the data is a labor-saving device. When possi¬ 
ble, it is most convenient to arrange the data of a case on a card, the size of 
which will of course depend upon the amount of information to be recorded. 
A card 3 by 5 inches is about as small as can be easily manipulated; only 
rarely is a card or sheet larger than 9 by 12 inches needed. The procedure of 
recording is facilitated if the schedule is printed with appropriate descriptive 
terms for the various categories of data or information to be entered. 































12 


INTRODUCTION TO STATISTICS 


In recent years the problem of recording and handling great masses of data 
has been met by the development of thejpunch cwd, on which can be recorded 
by machine any type of information that can be coded by a number system , 
as well as original data which are numerical. To handle the coded data of 
such cards, sorting machines and various kinds of tabulators have been 
developed (cf. Chapter 2, Section C). 

A schedule for a child’s school record is generally useful in two ways. (1) It 
is valuable for the teacher, psychologist, etc., who works with the individual 
pupil and attempts to iron out problems of that child’s adjustment to the 
school or other situations. This is a non-statistical, individual use of such 
information. (2) It is valuable for statistical purposes, which means that it 
is useful as one case in hundreds or thousands of such records which, when 
considered en masse or according to relevant groupings, may provide valuable 
information in the planning, financing, and management of an educational 
system. When the purpose is statistical, a given child’s schedule is more 
appropriately signified by a convenient number rather than by his name, 
since the investigator is no longer dealing with the individual child but 
with one case in a group or mass of statistical information. Such results as 
are obtained in the statistical treatment of the data will apply to the group 
as a whole rather than to an individual case. 

It should be apparent that it is impossible adequately to interpret the 
records of hundreds or thousands of such schedules of information unless the 
data are somehow classified ja nd su namari zed. An investigator or research 
worker is thus faced with the very practical problem of reducing a great bulk 
of data to a form that will be more readily perceived and understood. The 
procedures to be adopted for such classification and summarization depend 
specifically upon the purposes of the investigation or inquiry. In general, 
however, the statistical procedures that may be used include only a few 
alternatives; they will be described in detail in the following chapters on 
methods of descriptive statistics. Here it is emphasized that descriptive statis¬ 
tical procedures serve a need which arises as soon as the observations or data 
of any survey or inquiry become at all sizable or bulky. For, as R. A. Fisher 
says, “No human mind is capable of grasping in the entirety the meaning 
of any considerable quantity of numerical data.” * The statistical methods 
used for the reduction of data are of three kinds: 


\ 


1 . 

2 . 

3. 


Graphic methods 

Computational methods yielding numerical measures 
T£d)ular methods 


The aim, then, of descriptive statistics is the reduction of data so that 
the results of observation and measurement may be (1) made more immedi¬ 
ately meaningful, and jfZ) presented in a form that will make interpretation 

* R. A. Fisher, Statistical Methods for Research Workers, Oliver & Boyd, London, 7th ed., 
1938, p. 6. 



THE NATURE OF STATISTICAL DATA 


13 


and comparison of results easy and unambiguous. In the light of the preced¬ 
ing discussion, descriptive statistics can now be defined as the organization 
and summarization of collections of numerical data, including data arrived 
at by the simple method of enumerating instances. Descriptive statistics 
consists in the reduction of groups or masses of data by means of tables, 
graphs, and numeric al ineaswes such as percentoges or proportions, averages, 
measure s o f deyia tioiTbr dispersion, coefficients ^^p^jation, etc. 

That The methods of descriptive statistics are essential to the methods of 
analytical or sampling statistics is apparent. Whether the data are of a census 
or of a sample, the first step in their treatment consists in their appropriate 
reduction or simplification. 

C. THE NATURE OF STATISTICAL DATA 

In general, statistical data are of two kinds. They are derived either from 
variables or from non-variables . Consider, for example, the kinds of statistical 
information collected about human beings. Census data provide us with 
information concerning the incidence or number of people by geographical 
areas, their ages, distribution with regard to sex, etc. Psychologists and 
sociologists bring together many kinds of information concerning human 
behavior and intelligence. The statistical data of the latter investigations 
consist of various psychological measurements, the frequency of different 
behavior patterns, scores from questionnaires, interest inventories, etc. Some 
of these data are variable and others are non-variable. Let us see what the 
distinction between them is. 

Non-Variable Data 

' The incidence of the two sexes in a population provides a common example 
of non-variable data. Peoples are either male or female. Sex is a non-variable 
attribute. A person can be categorized as either one or the other. Further¬ 
more, no order is inherent in tlie arrangement of these two categories; that 
is, there is no basis in measurement for putting the male class .first and the 
female class second, or vice versa. A non-variable attribute is thus one that! 
exists with respect to distinct categories rather than with respect to a par¬ 
ticular degree. 

Non-variable data are often referred to as the data of cate gories. Categorical 
data are generally obtained simply by the enumeration of instances that 
occur, or that are observed to exist, with respect to the classes or categories 
under consideration. 

Variable Data 

In contrast to categorical data, variable data represent quantitative differ¬ 
ences (variation) in the manifestation of a property or trait or attribute. Thus, 
the age and height of persons arc examples of attributes that are variables. 



u 


INTRODUCTION TO STATISTICS 


rThe essential characteristics of a statistical variable are as follows: (1) The 
attribute being studied is capable of quantitative differentiation (at least, 
theoretically); (2) the data differentiated have order inherent in their nature, 
an order ranging from least to most. Thus, the age of individuals is infinitely 
variable (within the age range of human beings), inasmuch as age is susceptible 
to quantitative differentiation in terms of years, months, and days. Further¬ 
more, a collection of age data can readily be brought together with respect 
to the order inherent in them, namely, an order that ranges from least age 
to most age. 

Although sex was seen to be an attribute which yields non-variable rather 
than variable data, it should be observed that the ratio of males to females 
provides a measure—the sex ratio —^which is a variable attribute. The sex 
ratio is an index that may vary in size for different calendar periods or places. 

The Treatment of Statistical Data 

Most of the methods of statistics have been developed for the treatment 
and analysis of variable data. This is because variation has, historically, been 
practically synonymous with the concept of statistical phenomena. We have 
already seen that Quctelet and Galton pioneered in the nineteenth century 
in the development of statistical methods. By and large, the methods they 
were responsible for were concerned with the variations characteristic of 
human beings and other natural phenomena. We saw that Bowditch, study¬ 
ing the growth of children, was faced with the problem of somehow relating 
two variable attributes—height and weight—but that it remained for Galton 
to develop a method of determining the correlation between the measure¬ 
ments of two such variables. Furthermore, the statisticians of the nineteenth 
century were inclined to consider the data of variables as forming a distribu¬ 
tion of measures similar to the normal probability curve. For as large samples 
of data of various attributes or characteristics of man and other biological and 
social phenomena were observed and measured, the distributions of the col- 
I lections of data obtained for a given attribute were often found to approach 
1 the form of this curve. The normal probabilit y curvej.hui^ame to epito mize 
a fundamental pro pert yjoTli variable. Nevert heless, not all^variable ^tri- 
^tes yie ld, distrib utions of this form. On the other hand, categorical datai 
do not yield distributions of any kind. This is the case because the essence/ 
of any distribution is an ordered series of measures ranging from the least 
degree of the attribute observed or measured, to the most degree. Variable data 
are often referred to as the data of variates, and categorical data as the data 
of non-variates. 

In consequence of the kinds of statistical problems arising during the nine¬ 
teenth century and the early part of the twentieth, the bulk of the methods 
of statistics developed have been for the treatment of variable data. However, 
non-variable data are also important and accordingly some special methods 



THE NATURE OF STATISTICAL DATA 


15 


have been devised for their treatment and analysis.* These methods are 
especially relevant to many market research investigations, as well as to 
studies in social psychology and sociology. In Chapters 2-4, we shall present 
the basic statistical methods for non-variables as developed for problems of 
descriptive statistics, and in Chapters 5-9, the fundamental methods that 
have been developed for the descriptive treatment of variables. However, 
the distinction between these two sets of methods is not always sharp. The 
data of variables are sometimes treated by methods developed for categorical 
data. For example, in order to determine whether a particular aptitude test 
is satisfactory, the criterion therefor may be taken simply in terms of successful 
and non-successful performance. Obviously, performance is itself a variable 
attribute. However, we often lack satisfactory methods for quantitatively 
differentiating degrees of success or non-success and we obtain, at best, broad 
non-quantitative distinctions or differentiations of such attributes. 

The Mathematical and Logical Implications 
of a Variable—Series 

The essence of a statistical variable resides in the two properties already 
mentioned: (1) the capacity of a characteristic or attribute to be quantitatively 
differentiated (by some process of measurement or observation), and (2) the 
presence of an inherent order in the data. When the statistical data of an 
investigation satisfy these two conditions, they yield a series, o r scale, of 
measures. Such a series of measures ranges in numerical size from least to 
Highest values. The concept of a series is thus implic'd by the order inherent 
in the quantitatively differentiated data of a variable. Some variables, how¬ 
ever, can be studied and ordered into a series, but not quantitatively differ¬ 
entiated. Thus, the social interests of a group of people can be rated as “above 
average,” “average,” and “below average” (yielding a series with three broad 
classes), although then? may not be available a satisfactory process of measure¬ 
ment that will yield quantitative differentiations of varying degrees of the 
attribute social interest. 

Continuous vs. Discontinuous Series 

Even though the data of a variable may satisfy the two properties of quan¬ 
titative differentiation and order, there is a third property characteristic of 
such data that gives rise to a distinction among variables themselves. The 
data of variables may form either a continuom series of measures or a dis- 
continuous series. 

A continuous series of measures is one that, by definition, is theoretically 
susceptible to numerical subdivisions of any degree of fineness. A series of 

* Cf. G. V. Yule and IM. G. Kendall, An IrUrodaclion to the Theory of Statistics, Griffin, 
London, 12th ed., 1910, chaps. l-S. 



16 


INTRODUCTION TO STATISTICS 


age data is theoretically capable of such subdivisions. In practice, we may 
have no need to differentiate ages to finer degrees than years or months, but 
theoretically finer subdivisions in days or hours, etc., could be made. Such 
data thus form a continuous series, or continuum^ that ranges from the least 
observed value to the highest observed value. This is the case, even though 
each subdivision in a continuum may not actually have an empirical datum. 

On the other hand, the data of some variables do not, either in fact or the¬ 
oretically, form a continuum, or continuous series of measurements. A collec¬ 
tion of statistical information indicating the numb^of listeners per radio 
prQgraux, n umber of children per family, will yield data that satisfy 

the two basic properties of a variable, namely, quantitative differentiation 
and order. Such variable data may be arranged in a series ranging from the 
least number of listeners per radio program to the greatest number of lis¬ 
teners. However, it is obvious that only integral values cani^cur ; there sure, 
no fractions of radio listeners. A distribution of such data thus yields a series 
that is non-continuous. There are real gaps between the integral values lying 
within the limits of the series. Such di^ontinuous are often referred to 
as excrete, and the data of such a scries are sometimes called 
However, the latter terms are likely to be misleading, because discrete data 
are often confused with categorical data. It should be clear, however, that 
discrete data, as just defined, are the data of a variable rather than of a non¬ 
variable. Categorical data of a non-variable do not have an inherent order 
such that they can be arranged in a series of from least to most. 

In statistical practice the data of a discontinuous series are usually treated 
as if they formed a continuous series. Thus the average number of children 
per family is usually calculated to a fractionate value, as for example, 3.5, 
despite the fact that such a value is an obvious abstraction. An averagi^ is 
useful because, for a collection of such data, it indicates that the typical 
number of children per family is midway between three and four children. 

Exact and Approximate Measures 

' The data of statistical investigations are obtained by various methods. 
Ill general, however, the methods may be divided into two classes. (1) A 
great deal of statistical information is obtained by the simple method of 
e mmierating or counting instanc es. (2) Statistical data are also obtained by 
a process of observation and measurement that is more complex than the 
method of simple enumeration. 

The method of simple enumeration always yields an integral value. Such a 
value is an exact measure except for the possibility of errors in making a count. 
Categorical data are usually obtained by counting “noses,” but the data of 
some variables, such as the number of children per family, are also obtained 
in the same way. 

I On the other hand, the data of variables are often approximations. They are 





THE NATURE OF STATISTICAL DATA 


17 


usually obtained by methods that yield estimates of location or position in a i 
continuous series of values. Most of the measurements in the physical sciences [ 
are approximations obtained by well-defined methods of observation and 
measurement. Although they are approximations, they have, from a prac¬ 
tical point of view at least, very small margins of error; in fact, oftentimes the 
errors are so small that they can be neglected. In psychology and the social 
sciences, a test scor e is an example of an approximate measure. It is usually 
obtained by a method of observation and measurement that provides an esti¬ 
mate of a person’s position in a series or scale of test scores. 

By definition, an approximate measure is one theoretically capable of 
greater exactness if the methods of measurement are continually refined. 
It is apparent that a continuous series of numbers is implied by the concept 
of an approximate measure. The fact that a statistical datum may be an 
approximation rather Ilian an exact number is not, however, to be inter¬ 
preted as thereby belittling its significance or usefulness. On the contrary, 
the difference between exact and approximate measures is a difference that 
results from the methods used in obtaining them. As just indicated, the 
method of the enumeration of instances (basic to all statistical data of censuses, 
many market research investigations, etc.) yields numbers or measurements 
which are exact in the sense that they are the result of a count. Nevertheless, 
it is to be observed that so far as a research investigation may consist of a 
sample drawn from a population of instances (as is characteristic of most 
market research studies), the count made of a sample is necessarily treated 
as an approximation of the population. Even though the count of the sample 
may be an exact number per sc, from the point of view of its use as an estimate 
of a population value it is an approximation. 

Similarly, the initial measurements obtained from many psychology tests 
are based upon a count. Thus, a vocabulary test score may be simply an 
enumeration of the number of correctly defined words in a list. Originally the 
vocabulary test score is simply an enumeration of correct responses, and 
from this point of view it is an exact number. However, as an estimate of a 
person’s vocabulary ability, it is an approximation. This is true because the 
particular list of words used for the vocabulary test is only a sample of the 
test material that could be used for such a purpose. Since all psychological 
tests necessarily employ but a sample of test material, the measurements of 
ability yielded by a test are always approximations and never exact measure¬ 
ments. All such measures are estimates of people’s positions in a series or 
scale of test performance. All such estimates are approximations. 

EXERCISES 

1. In what sense is statisti(\s a form of applied mathematics? 

2. What are the implications of Quetelet’s work for the development of descriptive 

and sampling statistic^s? 

3. State the different ways in which the concept “statistics” is employed. 



18 


INTRODUCTION TO STAtlSTICS 


4. What is the essential difference between descriptive and sampling statistics? 

5. What is meant by the reduction of data? 

6. What different kinds of methods are utilized for the reduction of data? 

7. Distinguish between a non-variable and a variable attribute. 

8. Distinguish between a continuous and a discontinuous variable. 

9. What is the difference between exact and approximate measures? 



CHAPTER 2 


The Reduction and Organization of 
Categorical Data 

A. INTRODUCTION 

In this chapter we shall present some of the elementary but at the same time 
indispensable statistical methods for the treatment of the nqn-variatc type 
of data often obtained in psychology, anthropology, sociology, and related 
fields. The data of non-variable attributes, of categories, their collection and 
statistical treatment are of basic importance to the research worker, even 
though a majority of research problems yield variate data. 

From the point of view of the practical problems of research the initial 
task to be dealt witli is classification, _qr division, of large masses of non¬ 
variate data. The logic orclassification and division is essential to a sound 
use of methods for their reduction and comparison. Just as the psychologist 
and related scientists need to know the logic of measurement underlying the 
treatment of the data of variables, so they also need to know how to handle 
masses of non-variate data which first need to be classified and the results 
then described through the use of appropriate statistical techniques. 

We shall consider first the problem of classification. Then methods for the 
reduction of such data to a useful form will be presented. Basically, these 
methods are simply tabulation and enumeration. Methods for the comparison 
of such data will be developed in Chapter 3. These methods consist chiefly 
in the calculation of ratios or rates, such as percentages. Finally, in Chapter 4 
we shall present methods for the correlation of categorical data. 

B. THE CLASSIFICATION AND ENUMERATION OF AHRIBUTES 

Categorical data are enumerated instances of attributes or qualities of 
objects or individuals that are taken a^existing or not existing, rather than 
as existing to some degree. Hence, categorical data are derived from non¬ 
variable attributes, rather than from attributes or qualities that are variable. 

Dichotomous and Polytomous Classifications of Attributes 

Dichotomous Classification 

We saw in the preceding chapter that the sex of human beings constitutes a 
non-variable attribute. People can be identified as either male or not-male. 

19 




20 THE REDUCTION AND ORGANIZATION OF CATEGORICAL DATA 


This division is a dichotomous, mutually exclusive differentiation of a qualita¬ 
tive attribute. That is to say, it is a twofold classification of human beings 
wTtRIrespect to an attribute (quality or trait) that can be differentiated quali¬ 
tatively (male-kind and not-male-kind), but not quanti^ively. It is a 
division such that a person can be identified as being male (the presence of 
the attribute in question) or not-male (the absence of the attribute) but not 
both (the two categories are mutually exclusive). 

In the case of dichotomous divisions there thus should be two distinct, 
mutually exclusive classes or categories, as in the case of the attribute of 
SEX-KIND. The negative class, not-male, is of course usually identified by 
the positive descriptive term female. Although a positive term for the 
negative class is not always available for dichotomous classifications, the 
positive description of a class is empirically more satisfactory than the nega¬ 
tive, provided no ambiguity results.* 

Persons can also be divided into one or the other of the two following 
mutually exclusive categories: (1) the blind (total absence of vision), and 
(2) the NOT-BLiND, despite the fact that the latter class is variable, in that 
acuity of vision varies from little to much. Similarly, people can be divided 
into the light-haired and the not-light-haired. Here, however, the dif¬ 
ferentiation of the characteristics for each category is not so easy because of 
(1) the many variations in hair color, and (2) the problem of establishing 
satisfactory objective criteria for the appropriate identification of borderline 
cases. The extremes in hair color would, of course, be easy to identify and 
enumerate, but persons with in-between shades would be more difficult to 
classify. In any event, the line of division for an attribute of this kind would 
be arbitrary, whereas the distinction between the blind and the not-blind is 
not arbitrary. 

Polylomous Classificaiion 

Eye color is again an attribute that can be divided into two categories, the 
blue-eyed and the not-blue-eyed. This time, however, the dichotomy it¬ 
self is arbitrary. Eye color is an attribute which may, for research purposes, 
be more usefully differentiated into more than two categories. In fact, so far 
as it can be correlated with variations in degree of pigmentation, human eye 
color may be considered as a variable attribute. But at the present time there 
are no entirely satisfactory empirical methods for dealing quantitatively with 
this attribute. The usual method for field and laboratory purposes in psy¬ 
chology and anthropology consists in using a set of artificial eyes differing 
in pigmentation. By a matching technique (a person’s eye color being com¬ 
pared with the colors and shades of the artificial eyes until the best match 
is obtained), the color and lightness of an individual’s eyes are identified 


* Technically, dichotomous division is restricted by logicians to a positive statement of 
the differentia and its negative: A and not-A. 



CLASSIFICAJION AND ENUMERATION OF AHRIBUTES 21 

with one of several categories or classes that differ in hue as well as in in¬ 
tensity. Eye color is thus treated as a polytomous attribute, i.e., as consisting 
of several exclusive classes of hues which also have varying degrees of lightness 
or darkness. 

In order to avoid ambiguities in the differentiation and enumeration of 
people with respect to the attribute of hair color, similar matching tech¬ 
niques are used. In one such method, sample strands of hair are used (braided 
like strands of rope); the different colors range from the lightest to the darkest 
shades. The attribute of hair color is thus treated categorically, despite 
the fact that it might be possible to arrange differences in hair color on a 
quantitative scale ranging from least-dark to most-dark. 

Division by Exact Criteria 

The line of division between the categories of an attribute may be arbitrary 
but at the same time exact. This is especially true of any al tributes or traits 
that can be differentiated by enumeration or by a standardized method of 
measurement. Thus, men can be divided into two arbitrary but nevertheless 
exact classes with respect to the variable attribute of height. This can be 
done by taking six feet of stature, for example, as the dividing line between 
the two categories, tall-men and not-tall-men. Similarly, animals can be 
classified as bipeds or not-bipeds; trucks, as six-wheeled or not-six- 
wheeled; children, as living-with-both-parents or not-living-with-both- 
PARENTS. All such dichotomies as these have an element of arbitrariness, but| 
it should be evident that they can be exactly established, inasmuch as the' 
objects or individuals can be identified with one or the other of the dichoto-) 
mized categories by counting legs, wheels, parents, etc. 

In the final analysis, the problem of collecting and enumerating categorical 
data is one of the empirical identification of instances with respect to such 
distinctions among attributes as are relevant to the research investigation. 
If the dividing line of a dichotomized attribute is so vague as to produce 
ambiguities, then the researcher needs to develop more satisfactory criteria 
of division and identification. Dichotomous classifications imply that we can 
determine in which category of a twofold division an object is to be identified. 
Such a classification further implies that a given individual or object can be 
in only one of the two categories of the attribute; in other words, it implies 
that the two categories are mutually exclusive. 

Classification vs. Division 

From a logical point of view, a distinction is made between classification 
and division. If, in arranging the attributes of things, we proceed from the 
whole to its characteristics or traits—from the general to the less general— 
we are engaged in division. Thus, human beings may be divided into two 
groups—males and females—so far as the attribute of sex is concerned. 



22 THE REDUCTION AND ORGANIZATION OF CATEGORICAL DATA 


In classification, on the other hand, we begin with individual instances and 
seek common qualities or attributes—^principles of organization—for group¬ 
ing the individual instances into two or more relevant classes. Thus, humans 
a, b, c, d • • • n may be classified into two sex groups, males and females. 

It should be apparent that a procedure for division which will be useful 
will depend upon some knowledge of the character of the different kinds of 
individual instances that comprise the whole. Hence, in practice, division is 
often not empirically distinguishable from classification. In any event, the 
results of division and classification are similar in that each gives a logical 
arrangement of an attribute or quality into two or more exclusive categories. 

Science invokes both processes of division and classification in its attempt 
to make coherent and intelligible the apparent chaos of natural and social 
phenomena; however, uncharted fields of inquiry usually begin with individual 
^instances and hence with classification. When the subject matter of a field 
is not already systematically arranged and interrelated by relevant concepts, 
the initial task of an inquiry is to establish criteria for the classification of 
^individual observations. W ith the further developm en t of a field of reseeggh, 
it becom es J ncrea sin^gly important to utilize systems of classification that 
have been empir ically tested for their general usefulness. It is out of such 
verifiable schemes about natural phenomena that the systematic foundations 
of a science are made. 

Stratification: Classes and Subclasses 

Early students of zoology may have found the classification of all animals 
into the following three categories a useful scheme: 

1. Water animals 

2. Land animals 

3. Air animals 

But with increased knowledge of animals, this trichotomy came to be in¬ 
creasingly unsatisfactory. Not only because of ambiguities in the classifica¬ 
tion of individual animals but also because of the lesser relevance of habitat 
as the principle of classification, this scheme was finally abandoned in favor 
of others that took various attributes of animal structure and function as 
criteria for classification. Thus, zoologists today usually classify all animals 
into two general categories: * 

1. Protozoa (without gastric cavity, germ layers, or tissues) 

2. Metazoa (with gastric cavity, germ layers, and tissues; animals de¬ 

velop from eggs through cleavage, blastula, and gastrula 
stages) 

This is but a beginning toward the classification of animals. Protozoa are 
considered as constituting the first phylum, and Metazoa are divided into 

• Gf. E. G. Conklin, General Morphology of Animals, Princeton Univ. Press, Princeton, 
1927. 



CLASSIFICATION AND ENUMERATION OF AHRIBUTES 


23 


thirteen phyla, or general subclasses. Several phyla of Metazoa are further sub¬ 
classified into subphyla, subphyla into classes, classes into orders, orders into 
families, families into genera, genera into species, there being about a half 
million of the last subclass. Ordinarily, however, an animal is identified by 
only its genus and species (the binomial nomenclature introduced by Lin¬ 
naeus in the eighteenth century). Thus, human organisms are identified as 
of the genus Homo and of the species sapiens. Homo sapiens is of the Primate 
Family, which is of the Order Eutheria, which is of the Class Mammalia, which 
is of the Subphylum Vertebrata of the Phylum Chordata, which is of the Metazoa 
kingdom. 

This zoological classificatory system is cited here because it exemplifies some 
of the fundamental problems that arise in the classification of data into 
subclasses. The stratification of a radio audience or of a group of voters for| 
public opinion polls is based upon analogous schemes. It is the p rinciples oft ‘ 
c lassification that differ. 

The Classification of Children's Apperceptive Responses 

Although in many respects psychology, anthropology, sociology, etc., are 
beyond the classificatory stage of development, relevant schemes for classify¬ 
ing the data of an investigation still constitute an initial problem in much 
original research. Thus, in the analysis of personality differences in children, 
various kinds of projective techniques are being employed today. In a study 
by Elizabeth W. Amen,* the responses of 77 pre-school children to each of a 
series of 15 pictures were analyzed. After citing numerous instances of re¬ 
sponses by tliti children, Amen writes: 

In the foregoing examples, three major types of responses can be observed: 

(a) A simple naming or other identification of objects. This may be regarded as 
response in terms of static form or enumeration (“A boy, a lady”). 

(b) The description of the picture situation in terms of overt activity (“This little 
girl is eating her breakfast”). 

(c) Inference as to psychological states or inner activity (“A little boy dc»esn’t want 
to eat and his mama’s going to get him to”). 

When we study these categories of response, with refereiice to age level, it is appar¬ 
ent, from the examples given, that the two-year-olds respond predominantly in terms 
of static form. Description in terms of overt activity ^ more common after three 
years, and the suggestion of inner activity, rarely shown at two years, is fairly common 
at four years. 

In the case of uncharted fields of inquiry, relevant principles of classification 
require a good deal of insight on the part of the investigator. But once the 
categories for classification are well defined, the statistical techniques neces¬ 
sary for reducing the data and presenting them for comparative purposes are 
relatively simple. 

* Elizabeth W. \men, “ Individual Differences in Apperceptive Reaction: A Study of the 
Response of Pre-school Children to Pictures,” Genetic Psychology Monographs, 23:319-385 
1941. 



24 THE REDUCTION AND ORGANIZATION OF CATEGORICAL DATA 


Amen’s trichotomous classification of the children’s responses evidently 
not only proved to be usable from the point of view of identification and 
enumeration of instances (responses), but revealed, as indicated, distinc¬ 
tions correlating somewhat with age difference. She brings together the 
verbal summary of her results, just quoted, into a single table which, however, 
omits the enumeration of instances (N) and gives only the mean percentage of 
responses for each category by age group. Thus: 


Table 2:1. Mean Percentages of Types of Pre-school Children’s 
Responses to Pictures 


[Category of Response] 

[Age Groups] 

2 Yrs. 

3 Yrs. 

4 Yrs. 

Static form 

73 

38 

23 

Outer activity 

26 

50 

51 

Inner activity 

1 

12 

26 

[Total] 

[I00»] 

[100%] 

[100%] 


* Material in brackets not in Amen's original table. 


This cross-tabulation of Amen’s results clearly indicates a correlation 
between category or type of response and age. Static form response is pre¬ 
dominantly associated with the two-year group. Outer activity response is 
associated more with the three- and four-year groups than with the two-year- 
olds. Inner activity response is practically absent among the two-year-olds 
but is present in about one-eighth of the three-year group and in about one- 
quarter of the four-year-olds. 


^7 Rules for Logical Division and Classification 

I The need for the classification of phenomena is apparent from the fore¬ 
going examples. Natural phenomena, whether they be rocks or people, or the 
habits, attitudes, and preferences of people, need to be classified on the basis 
o f similarities and differences in order that we may perceive and understand 
what goes on in an otherwise cliaotic world~df multitudinous and apparently 
uS( ^at^"^ent8 . The cTfiiSification of things into categories is itself such a 
^^natmaP^^roc^ of the human mind that we often overlook the bases for the 
sound classification of phenomena, or we become aware of them long after 
having engaged in the process of clsissification itself. Many children, for 
example, grow up with the stereotypes of their social environment and un¬ 
thinkingly classify all people of a given skin color as good or bad, or all people 
of a given religion as more or less virtuous than other people. 

Scientific method in the classification of phenomena aims to assure two 
things: (1) that a class or division be so defined and estatblished that there 
will be no ambiguity or error in the identification of a thing or event in the 



CUSSIFICATION AND ENUMERATION OF ATTRIBUTES 


25 


class; and (2) that the exploration and establishment of relations for phe¬ 
nomena within classes or between two or more classes be done on the basis 
of empirical information or evidence rather than on the basis of personal 
whim or prejudice. To classify a person as a Negro because of his skin color 
is a problem of identification; to conclude by virtue of the classification that 
he also has certain habits or attitudes is a separate problem of the relation¬ 
ship between two or more kinds of attributes or traits. In other words, there 
is the problem in scientific method not only of the identification of a phe-j 
nomenon with a clsiss, but also of the correct association of the relations! 
between attributes or characteristics of things or events. 

There are three rules of logical division which are also applicable to the 
problem of classification. They are usually described as follows: * 

A division needs to be exhaustive. 

In division, a category needs to be provided for e very instance or member of 
t he whol e that is being divided. From the point of view of classification, i.e., 
working from individual instances to the whole, it is of course sometimes 
impossible at the outset of an investigation to have all categorical divisions 
perfectly provided for. As data are accumulated, new categories sometimes 
have to be added in order to avoid ambiguities in classification. 

/ 

^2. The divisions into which the whole is differentiated need to exclude one another. 

The import of this rule is no doubt obvious. There should be no overlapping 
of instances from one division or category to another, so far as a given attribute 
is concerned. This exclusion is essential for the elimination of ambiguities 
in identification. 

The division should be based upon a single principle of differentiation. 

This is the principle of the fundamentum divisionis. Theoretically at least, 
division should be made with respect to a single quality or attribute. If a series 
of divisions and subdivisions is made, as in the stratification of data, this 
principle should be followed in each succeeding level of division. The division 
of children into t^ d ull, the bright, and the_ fair-skinned obviously violates 
this rule, since two attributes, mental status and skin color, are involved 
at the same level of division. The second rule of exclusion is also violated. 

Classification of Judgments, Attitudes, and Opinions 

Let us consider the application of these rules to the classification and divi¬ 
sion of people’s judgments, attitudes, and opinions. In the comparison of 
pairs of weights, A and B, a subject is often doubtful as to which is the 

* Cf. M. R. Cohen and Ernest Nagel, An Introduction to Logic and Scientific Method, 
Harcoiirt, Brace, New York, 1934, especially pp. 241, 242; also R. M. Eaton, General Logic, 
Scribner’s, New York, 1931, especially pp. 282 ff. 



26 THE REDUCTION AND ORGANIZATION OF CATEGORICAL DATA 


heavier when the difference in physical weight is slight. His judgments are 
then frequently identified and enumerated with respect to one or the other 
of three categories, viz., 

1. A heavier than B 

2. A lighter than B 

3. Doubtful 

This appears to be a trichotomous (or threefold division), but second thought 
suggests that what we really have here is, first, a dichotomous division of 
the character of all judgments into certain and doubtful, and second, a 
further dichotomous subdivision of the certain judgments into A-heavier- 
than-B and A-lighter-than-B. Instead of a true trichotomy, this situation 
illustrates the beginning of stratification, i.e., the division of classes into sub¬ 
classes, subclasses into sub-subclasses, etc. Thus: 

A. CERTAIN judgments B. DOUBTFUL JUDGMENTS 

1. A-heavier-than-B 

2. A-lighter-than-B 

In recent years a multiple-choice answer method has been found useful 
in the developmenroTaltitude and intoest queSionnaires. E. K. Strong’s 
Interest Inventory,* for example, asks whether the examinee likes “driving 
an automobile.” The answer can be signified as 

L (Like) or I (Indifferent) or D (Dislike) 

The examinee is also asked whether he can write a concise, well-organized 
report. His answer can again be indicated in one of three ways, viz., 

YES or ? (not sure) or NO 

Are these examples of truly trichotomous divisions? The latter is similar to 
the example of judging weights in that the examinee’s judgments on this 
type of three-choice answer can be divided into sure and not-sure, with a 
further subdivision of the sure judgments into yes and no. However, the 
three-choice answer of the type L, I, and D (Like, Indifferent, and Dislike) 
appears to be in a somewhat different class because each type of answer ex¬ 
presses an attitude toward the verbalized situation (viz., driving an auto¬ 
mobile). Does this threefold differentiation of attitudes satisfy the rules for 
satisfactory logical division and therefore yield a true trichotomy? 

In order to answer this question, we need first to determine whether persons’ 
attitudes are really exhausted by the threefold classification of Like, Indiffer¬ 
ent, or Dislike. Pragmatically, i.e., from the point of view of the practical 
problems of research, it appears that many individuals have little difficulty 
in shaping their attitudes to fit into one or the other of these three categories, 
so long as it is clear that extreme dislike (hate) and extreme like are not 

* E. K. Strong, Vocational Interest Blank for Men {Revised), Stanford Univ. Press, Stan¬ 
ford University, 1938. 



CLASSIFICATION AND ENUMERATION OF AHRIBUTES 


27 


excluded. Som etimes fivefold or sevenfold division s (cf. five- poin t and seven- 
p oint attitude scales ) are used in the attempt to obtain finer dis tinction s in 
attitude. " 

As for the second rule: In practice there is only the respondent’s initial 
difficulty at times in deciding between Like and Indifferent, or between In¬ 
different and Dislike. But once his judgment is made, there is no further 
difficulty in classifying it. The fundamental ambiguity that arises in psy¬ 
chological research with such choices hinges on the differences in meaning 
that a given type of answer may have for different people. Two persons may 
say that they dislike driving an automobile, but the experience of one may 
have been limited to a ten-year-old wreck of a car on corduroy roads, whereas 
the experience of the other may have been a $25.00 fine for speeding on a 
modern highway in a super-streamlined model. The point of course remains 
that both answer that they dislike driving automobiles. For some research 
purposes this “identity” of answer may, however, yield the desired and 
sufficient information. 

It is the third rule that needs most carefully to be considered in examining 
the satisfactoriness of the threefold division (or classification) of attitudes 
into Like, Indifferent, and Dislike. Is this division based upon a single prin¬ 
ciple? The attribute or quality in question might be characterized as a feel¬ 
ing ATTITUDE FOR A SITUATION. But how about feelings of anxiety, of joy, 
hope, sorrow, and pain? And is an attitude of indifference to be classed as a 
feeling attitude for a situation? Perhaps indifference usually means doubtful 
or uncertain. If the latter is the case, then, as already indicated, we have a 
division and subdivision of attitude rather than a trichotomous arrangement. 
The certainty of the attitude would be the first principle of division, and then 
the certain attitudes would be subdivided into attitudes of Like and Dislike. 

In practice, the descriptive terms Like, Indifferent, and Dislike are often 
taken as if they yield a trichotomous division of feeling attitude for a situa¬ 
tion and, consequently, as if a single principle of division of classification is 
involved. In practice, then, the third rule for division or classification is not 
always strictly adhered to, with consequences sometimes more useful than 
absurd. We have seen that no single principle but many principles are used 
in the biologist’s classification of organisms. If, in the preceding example, we 
admit that a feeling attitude of Indifference introduces a principle of division 
which is not the same as that used in distinguishing attitudes of Like and 
Dislike, the practical gain of such a “trichotomous” division may neverthe¬ 
less well offset the failure to satisfy a formal principle of logic. It is when a 
trichotomous division gives misleading or inaccurate information that it 
should be discarded. 

Attitude Scales and Variable Data 

On the other hand, answer choices that express attitudes are often treated 
as if they form a continuous scale of variable data ranging from least to most 



28 THE REDUCTION AND ORGANIZATION OF CATEGORICAL DATA 


in a given kind of attitude. If such an approach is attempted with the atti¬ 
tude choices of Like, Indifferent, and Dislike, the investigator is faced with 
the problem of locating attitudes of Indifference on the scale. The scale could 
be defined, for the quality of Like, as ranging from Least Like to Most Like, 
Dislike attitudes would be in the lower range of such a series, but Indifference 
would not fit into the scale at all, inasmuch as it does not represent any 
degree of Like. As a matter of fact, it is difficult to see how Dislike could be 
in such a scale in a genuine psychological sense. Attitudes of Dislike are 
qualitatively different from attitudes of Like. The verbal device of Lea^st Like 
for Dislike is only a dodge for casting the results into the form of a variable. 
Essentially, as we have already pointed out, attitudes of Like, Indifference, 
and Dislike may be treated as if forming a trichotomy or 05 if forming a 
dichotomy of Certain and Not-Certain attitudes, with the Certain ones 
further subdivided into a dichotomy of Like and Dislike. Attitudes of Like 
may among themselves range from least to most, or from little to much, and 
hence be treated as if they form a scale of variable data. Similarly, attitudes 
of Dislike may vary in degree of Dislike, attitudes of Indifference may vary in 
degree of Indifference. But to attempt to force all these three kinds of atti¬ 
tudes into the same scale is to belie the basic facts. 

An Ambiguous Trichotomy 

In a poll of voters’ opinions, the following question is asked: 

WHOM DO YOU EXPECT TO WIN THE NEXT MAYORALTY ELECTION IN 
NEW YORK CITY? 

The wording of this question suggests that each person polled will have an 
expectation as to who will win. However, tluree types of answers may be 
received: 

1. Mr. X 

2. Mr. Y 

3. Don’t know (DK) 

Do these three classes of replies constitute a trichotomous division of three 
exclusive classes? No, because Don't Know here means no expectation, and 
this is a class or category to be contrasted with that of expectation. 

Instead of the trichotomy suggested by these results, we therefore really 
have a dichotomy with respect to expectation, and a further dichotomized 
subclass with respect to two candidates. In other words, two different strata 
of replies are involved. Unlike the categorical division of an attribute into 
two or more classes which should be exclusive and exhaustive, the categories 
between different strata (classes and subclasses) are necessarily overlapping. 
Thus, the attribute of the first stratum is expectation, with its negative, 
NO expectation. These categories are dichotomous because they are mutu¬ 
ally exclusive and exhaustive of all such opinions. The second attribute, viz., 
expected candidate to win may also be stated dichotomously as “Mr. X” 



CLASSIFICATION AND ENUMERATION OF ATTRIBUTES 29 

and NOT-Mr. X. If there are only two candidates, the negative of “Mr. X” 
could be stated positively by the name of his opponent, “Mr. Y.“ If, however, 
there are more than two candidates, we may arrange at least a trichotomous 
division for the second stratum attribute. Thus: 

EXPECTED CANDIDATE TO WIN 

1. Mr. X 

2. Mr.Y 

3. Other 

A tabulation of a poll of voters’ opinions can thus be unambiguously arranged 
as follows: 

A. THOSE WITH AN EXPECTATION (or Opinion) 

1. Mr.X 

2. Mr.Y 

3. Other 

B. THOSE WITH NO EXPECTATION (or DO Opinion)— (DK's) 

Such a stratification of voters’ opinions is not only unambiguous and exhaus¬ 
tive, but essential to an adequate interpretation of the results. For example, 
the percentages of voters’ opinions need to be considered in relation to the total 
sample of replies, including those with no expectation, for otherwise, if the 
latter answers are not included. Candidate X might receive a majority of the 
expectancies but in reality be named by less than a majority of all the people 
polled. Possible ambiguities and misleading interpretations can always be 
avoided if the researcher makes clear the base used in computing a percentage 
and at the same time indicates the character of the total sample result. 

Classification of Don’t Know’s (DK’s) in Market Research 
Investigations 

The frequent occurrence of don’t know responses in market research \ 
investigations gives rise to a class of categorical data which needs careful 
analysis, else ambiguities will creep into the interpretation of results. Gen^^ 
erally, we need first to distinguish between DK's obtained from questions of 
information and DX’s obtained from questions of opinion. The DX’s for the 
question, Whom Do You Expect to Win the Next Mayoralty Election in 
New York City? cited in the preceding section, are of the latter kind; the 
respondents answering don’t know had no opinion. Now let us consider DX’s 
to questions of information. Generally, a respondent’s DK will be sympto¬ 
matic of either (1) ignorance or (2) failure to recall a once-known fact. 

Lazarsfeld,* who has analyzed the problem of DX’s in market research, 

* P. T^zarsfelcI, The Statistical Handling of the Don't Know's, Office of Radio Resean^h, 
Columbia University, New York, 1941. 



30 THE REDUCTION AND ORGANIZATION OF CATEGORICAL DATA 


points out that D/iC’s to questions of information may be generally classified 
into one or the other of two groups: 

I. DK*s to questions of information, the answers to which the researcher 
knows. 

II. DK*s to questions of information, the answers to which the researcher 
himself does not know. 

Type I DK 

When an investigator is chiefly interested in ascertaining whether a respond¬ 
ent knows or can recall a fact which the investigator himself knows, the DK's 
obtained will be of Type I. These are the D/f’s that occur on most radio quiz 
programs. Thus, DK's to questions of the following kind are Type I: 

WHO IS THE CHIEF JUSTICE OF THE U. S. SUPREME COURT? 

WHO SPONSORS THE WEEKLY RADIO HIT PARADE OF POPULAR SONGS? 
WHAT MAN HAD TWO NON-SUCCESSIVE TERMS IN THE WHITE HOUSE? 

In all these questions, the aim is to find out whether the respondent is ac¬ 
quainted with certain facts which the investigator himself knows. Questions 
on most college examinations are obviously of this type. 

Type II DK 

Type II DK results when the investigator is chiefly interested in obtaining 
information which he himself does not have and which can usually be obtained 
most readily by asking an appropriate sample of respondents. Whereas, in 
questions leading to Type I DK, the investigator wants carefully to avoid 
giving the respondent any suggestions or clues to the answers, in questions 
leading to Type II DK he usually attempts to aid the respondent in recalling 
as accurately and completely as possible the information sought. Psycho¬ 
logical questions in personality diagnosis are of this second class. Thus, the 
investigator asks: 

DO YOU HAVE HEADACHES FREQUENTLY? 

Here the fact is not known to the investigator; and if the respondent replies 
don’t KNOW, the investigator may attempt to clear up the uncertainty of 
such an answer by asking the respondent further questions and by indicating 
more precisely what is meant by the modifier “frequently.” 

Questions leading to DK's of Type II occur extensively in market research 
investigations. Thus, 

WHAT KIND OF SOAP DO YOU NOW USE? 

HOW MANY SUITS HAVE YOU PURCHASED DURING THE PAST YEAR? 
WHEN DID YOU PURCHASE THE CAR YOU NOW OWN? 

DO YOU TRADE AT X STORE? 

DK's of Type II are much more difficult to classify than DK*s of Type I. 
The latter are usually unambiguous in their research implications. If a re- 



CUSSIFICATION AND ENUMERATION OF AHRIBUTES 


31 


spondent says he does not know who sponsors the weekly Hit Parade of 
popular songs, it is clear that his DK means ignorance of or failure to recall 
a fact. From the investigator’s point of view, this DK is final in that it gives 
him the type of information he wants. If he finds that the great majority of 
listeners to a sponsored radio program give D/iC’s to this type of question, it 
is clear that the advertising on the program is not very effective. 

On the other hand, DK^s of Type II are likely to be ambiguous in their 
research implications, even though they also may imply ignorance of or failure 
to recall a once-known fact. Type II DK^s give difficulty because the investi¬ 
gator himself usually does not know the answer; hence, the inclusion of such 
DK's with other classes of answers may give misleading information. 

Type II DK*s in a Market Research Investigation (Example adapted from 
Lazarsfeld) 

A psychologist conducting an investigation of consumers’ habits and motives 
asks a sample of 3000 adults of City X 

IN WHAT KIND OF STORE DID YOU BUY THE SHOES WHICH YOU ARE 
WEARING? 

The hypothetical results are tabulated and summarized in Table 2:2. 

Table 2:2. Replies of 3000 Adults to the Question: In What Kind 
of Store Did You Buy the Shoes Which You Are Wearing? 


Type of Store 

Number of Respondents 

1. Department store 

1620 

2. Chain shoe store 

360 

3. Independent shoe store 

510 

4. Don't know 

510 

Total 

3000 


As it stands, this tabulation gives four categories for the attribute type of 
STORE. However, if we can assume that the three types of stores named are 
exhaustive of all possible types, the shoes of the 510 DK's were necessarily 
bought in one or the other of the stores in the first three categories. The 
DK's thus do not form a fourth, separate (exclusive) category for the attri¬ 
bute TYPE OF STORE. How, then, shall the classification be arranged? 

The answer depends upon the nature of the problem under investigation 
and the consequent direction of research interest. If the basic problem is 
one of ascertaining whether the 3000 people interviewed could remember 
where they bought their shoes, then the data in Table 2:2 need to be reclassi¬ 
fied into a major stratum for the attribute recall, and into a substratum 
for the attribute type of store, as in Table 2:3. 

If, however, the chief research interest is in the fact of where the 3000 pairs 
of shoes were bought, rather than the respondents’ ability to recall, the classi- 






32 THE REDUCTION AND ORGANIZATION OF CATEGORICAL DATA 


Table 2:3. Memory of 3000 Adults for Type of Store 
in Which Shoes Were Purchased 


Type of Store 

Recall 

Remember 

Don’t Know 

1. Department 

1620 

? 

2. Chain 

360 

? 

3. Independent 

510 

? 

Total 

2490 

510 


ficalion of the data in Table 2:2 can be modified as in Table 2:4 so as to yield 
this information and at the same time clarify the hidden character of the 
510 Z)A:’s, 

The logical possibility of the arrangement in Table 2:4 of the data in 
Table 2:2 is dependent upon the assumption already mentioned, viz., that 
the shoes of the 510 DK's were purchased in one or the other of the three 
types of stores named. If these three types are exhaustive of all classes of 
stores in which shoes are purchased, the logic of the classification is sound. 

Table 2:4. Type of Store in Which 3000 Pairs of Shoes Were Purchased 


Type of Store 

Frequency 

1. Department store 

1620 

2. Chain store 

360 

3. Independent store 

510 

4. Department or chain or 


independent (DK’s) 

510 

Total 

3000 


Fig. 2:1 


The fact still remains that this fourth category (Department or Chain or 

Independent Store) for the 510 DK's is 
not exclusive of the other three. Ob¬ 
viously, it is a class which, in an un¬ 
known manner, overlaps with at least 
one, possibly all, of the three categories 
of store types. Thus, logically, we have a 
situation such as is illustrated by Fig. 2:1. 
The radii and inner circle are drawn with 
broken lines because the actual break¬ 
down of the 510 DK's is not available 
and consequently the exact proportions 
of the three types of stores to the whole 
are uncertain. The inner circle is drawn 
to run through all three categories because 




CLASSIFICATION AND ENUMERATION OF AHRIBUTES 


33 


there is the logical possibility that at least some of the DK purchases were 
made in each type of store. ^ 

There remains the reseMch problem of reducing or completely eliminating 
the DK*s by securing more information from the respondents, perhaps by 
looking for labels in their shoes. In any event, the investigator would not be 
logically justified in distributing the DK's among the three types according 
to the proportions of each type to the total number of purchases unless he 
had empirical evidence (as from a sample of the 510 DK's) to warrant such 
a breakdown. Although it is possible that the DK's came from respondents 
who generally trade in all three types of stores, it does not necessarily follow 
that their shoe purchases would be distributed among the three types in the 
same proportions as were the purchases of those who remember where they 
bought their shoes. 

The Statistical Frequency 

The statistical data of categories are obtained by (1) the identificaiion of 
the presence (or absence) of the attribute or trait with its appropriate class 
or category, and (2) the enumeration or count of the number of individual in¬ 
stances thus identified with each category. The enumeration of all instances 
in a given category yields the statistical frequency of that category. 

Statistical frequencies of categories or of classes constitute the raw data 
of an inquiry. Thus the data in Tables 2:2, 2:3, and 2:4 consist of raw data, 
whereas those in Table 2:1 are not raw data but refined data, i.e., they have 
been treated statistically and are presented as mean percentages. 

Once the principles of classification have been determined for an investiga¬ 
tion and the appropriate categories established, the next steps in treating 
the information available arc identification and enumeration. 

Enumeration vs. Measurement 

In a broad sense, i.e., in the sense that measurement may be regarded as 
“the delimitation and fixation of our ideas of things, so that the determina¬ 
tion of what it is to be a man or to be a circle is a case of measurement,” * 
enumeration or counting may be des cribed as a form of meas u remen t. But in. 
a stricter meaning of the~term, andlis ordinarily employed in scienc^ usingj 
quantitative methods, measurement means the determination of the magni-l 
tude or size of an attribute, i.e., the determination of the degree to which an 
object or an individual instances manifests a property or quality that varies 
in amount or extent, or from least to most, or from more to less. In this' 
sense, measurement may be defined as “the correlation with numbers of 
entities which are not numbers.” t S uch measure ment m^ns the quantita¬ 
tive differentiation of an attributej)rquality. It is the kind ^measurement 

Ernest Nagel, On the Logic of Measurement, New York, 1930, p. 17. 

t Ibid., p. 17. 



34 THE REDUCTION AND ORGANIZATION OF CATEGORICAL DATA 


usually implied as characteristic of the study of individual differences in 
psychology.* 

Stratification—^An Opinion Poll 

In the development of a student opinion poll of upperclassmen at the 
College of the City of New York during the academic year 1940-1941, f an 
attempt was made to increase the representativeness of a 10% sample of 
200 upper-class male students by classifying the 2035 Juniors and Seniors 
with respect to several attributes. The 10% sample was then drawn randomly 
from each category of the established strata, according to the proportion of 
all students identified with each, as will be explained below. 

When each individual (or object) of a group is classified with respect to 
more than one attribute, each will be a member of more than one class; this 
is characteristic of stratification. Thus, each student of the population of 
2035 was classified with respect to the following attributes: college class, 
DEGREE GROUP, ROTC MEMBERSHIP (as wcll as several other attributes not 
included here). The principles involved in stratification are the same for any 
number of attributes as for three. The classification completed, the sample 
of 200 upperclassmen was drawn randomly and in proper proportions from 
each combination of categories of the stratified population. It is the develop¬ 
ment of the latter in which we are interested at this point. 

In the stratification of a population for sampling, the first problem is to 
decide upon the attributes to be used. This choice is necessarily limited by 
the available information about the characteristics of the population to be 
studied. Within this limit, the investigator attempts to choose attributes that 
are significantly related to the problems of the investigation. In practice, it 
is usually necessary to make preliminary investigations in order to ascertain 
whether a given attribute (quality or trait) is relevant. In The City College 
opinion poll, the upperclassmen were stratified with respect to collegia 
CLASS, degree group, and rotg membership on the presumption that 
individual differences in these attributes might be associated with differences 
in the opinions to be studied. 

The Schedule of Information 

The investigation began, then, with the population of 2035 upperclassmen. 
The first step consisted in obtaining the necessary information for each 
student and recording it in a form convenient for further use. Ordinarily, 
the construction of a schedule on a 3 by 5 card serves satisfactorily (see 
Figure 2:2). An individual card for each person has the advantage of easy 
manipulation and arrangement in the course of the investigation. When 
machines are available, the data can be transferred from the original cards to 
punch cards and a good deal of the sorting and analysis facilitated. 

* Cf. M. R. Cohen and Ernest Nagel, op, cit., chap. 15. 

t M. Dreyfuss, The City College Opinion Poll, Honors Research in Psychology, 1941. 



CLASSIFICATION AND ENUMERATION OF AHRIBUTES 
Fig. 2:2. Schedule of Information for Stratification of Sample 


35 


Name ..JOHN JONES. No.1908... 

COLLEGE CLASSt L Jr. L.Sr. U.Sr. 

DEGREE GROUP: Arts Scl. Bus. Adm. Tech. Educ. 

ROTC MEMBERSHIP: Yes_X_ No. 


As indicated in the schedule card in Fig. 2:2, there were the following four 
categories for the attribute college class: 

1. L.Jr. = Lower Juniors 
FIRST 2. U.Jr. = Upper Juniors 

STRATUM 3. L.Sr. = Lower Seniors 

4. U.Sr. = Upper Seniors 

Within each of the four categories of the first stratum there were six differ¬ 
ent categories for the attribute degree group, as follows: 

1. Arts (B.A.) 

2. Social Science (B.S. in Social Science) 

SECOND 3. Science (B.S.) 

STRATUM 4. Business Administration (B.B.A.) 

5. Technology (Engineering, with several different de¬ 
grees not differentiated here) 

6. Education (B.Ed.) 

Finally, for each of the preceding groups of the second stratum there was 
the following dichotomous substratum for the attribute of rotc imembership: 

1. ROTC (those who were at the time or had been rnem- 
THIRD bers of the voluntary ROTC Unit at the College) 

STRATUM 2. NR (those who were not at the time and had not been 
members of the ROTC at the College) 

The stratification of 2035 students with respect to the three attributes 

COLLEGE CLASS, DEGREE GROUP, and ROTC MEMBERSHIP tlius yielded three 
strata with four exclusive and exhaustive categories in the first stratum, six in 
the second, and two in the third. Again it should be noted that within a given 
stratum the categories are exclusive, whereas between strata they are not 
exclusive. Each individual is identified with respect to all three attributes: 
his COLLEGE CLASS, and his degree group, and rotc membership. 

Exclusive Combinations of Attributes 

This particular stratification of a population yields the possibility of 
48 exclusive combinations of attributes, since 4 (classes) X 6 (degrees) X 
2 (rotc) = 48. Membership in any one of these 48 cells is exclusive; that is, 
a given student can be classified with respect to only one combination of the 








36 THE REDUCTION AND ORGANIZATION OF CATEGORICAL DATA 


three attributes. However, there are not necessarily members available for 
each one of the combinations or cells of a stratified matrix, i.e., the layout of 
a scheme for stratification. 

Statistical Frequencies 

With the necessary information available for each student, the next step 
consisted in arranging a matrix or table for the systematic tally of the fre¬ 
quencies of students for each combination of attributes. Each student was 
then identified and tallied in the appropriate cell of the stratified matrix and 

Fig. 2:3. Part of Tally Sheet for Stratified Data 


Arts 


Social Science.. 


Science. 


-U.Sr.- 

ROTC NR 

-<-LSr.- 

ROTC NR 

c 

(SS S S IS 
IS suss 1 

□ □ 


so 

□ ssss 
ss s ss 
ss s s s 
ssssc 

SSL 


IS S IS s 

- 







—— J 


the frequencies per cell were summed. If the number of cases in any cell is 
likely to run into three figures, it is important that the original work sheet 
for tallying be laid off on a fairly large scale. Since the tallying procedure 
is the same throughout the work sheet, only part of it is reproduced in Fig. 2:3. 
The box method for tallying is used. Thus a box with a diagonal is made for 
each group of five cases in a cell, as follows: first case, 1; second case, L; 
third case, IZ; fourth case, □; fifth case, E3. This method eliminates errors 
that often arise in thejjff III method of tallying. The boxes are easy to count 
since all cases are quickly identified in groups of five. 

The final tabulation with the frequencies summed per cell is presented in 
Table 2:5. The Total column at the right of the table and the two Total 
rows at the bottom bring together succinctly the quantitative information 
for each category of each stratum. A stratified matrix like that in Table 2:5 





METHODS FOR TREATMENT OF ORIGINAL DATA 


37 


has the advantage of presenting a summary picture of all the basic data, and 
in such a way that it can be readily condensed. It should be noted that the 
data of a stratified tabulation are cross-tabulated* That is, instead of separate 
and independent listings of the information for each category of each stratum, 
the students who are Upper Seniors and Members of the Arts Degree group 
and of the ROTC group are indicated (N = 3). In a similar fashion, each of 
the remaining 47 cells represents a cross-tabulation of the possible combi¬ 
nations of the three attributes. Such a cross-tabulation not only is essential 
for the determination of the proportion of individuals in each possible com¬ 
bination of the three attributes, but is also the basis for a correlational analysis. 
(See Chapter 4.) 


Table 2:5. Stratified Classification of 2035 Students by College Class, 
Degree Group, and ROTC Membership 


College Class and ROTC Membership 


Strata 

U.Sr. 

j L.Sr. 

UJr. 

LJr. 

Totals by 
Degree Group 


ROTC 

NR 

ROTC 

NR 

ROTC 

NR 

ROTC 

NR 



Arts 

3 

46 

9 

38 

8 

32 

8 

44 


188 

1 Soc. Sci. 

9 

98 

12 

123 

23 

102 

32 

142 


541 

im 

^ Science 

20 

135 

33 

140 

33 

119 

38 

161 


679 

$ Bus. Adm. 

1 

2 

0 

0 

0 

0 

0 

5 


8 

O Tech. 

27 

126 

31 

83 

39 

86 

59 

123 


574 

Educ. 

1 

11 

3 

11 

1 

7 

1 

10 


45 

Totals 

61 

418 

88 

395 

104 

346 

138 

485 

N 

= 2035 

Totals by College 











Classes 

1 479 

1 

1 

1 

N 

- 2035 

Totals by ROTC 

ROTC 

NR 

= 61 + 88 + 104 + 138 391,- 

= 418 + 395 + 346 + 485 = 1644 






C. METHODS FOR TREATMENT OF ORIGINAL DATA 

The Hand-Sorting of Statistical Data 

We have already indicated that the tabulation and analysis of statistical 
data are often facilitated by the use of an individual schedule, say a 3 by 5 
card, on which the information for each case is systematically recorded. Not 
only is it easier to work with the data from such cards, but the tallying pro¬ 
cedure itself can often be simplified by hand-sorting the cards into appro¬ 
priate classes or categories prior to the actual tabulation. 

Thus, the data tabulated in Table 2:5 were obtained from the individual 
records of 2035 students. Accuracy in tallying as well as greater ease in the 
























38 THE REDUCTION AND ORGANIZATION OF CATEGORICAL DATA 


process was obtained by first sorting the members of the degree groups into 
their respective categories. 

A further advantage that arises from recording the data of each case in a 
study on a separate card is that any part of the whole group of data can 
readily be studied by pulling out the cards of that part. By contrast, if all the 
data of, say, 50 or more cases are recorded on one sheet of paper, not only is 
hand-sorting of the cases impossible but it is very difficult to work with the 
data of only a part of the whole group. In the latter instance it is practically 
necessary to re-record the data of the smaller group on a separate sheet of 
paper in order to avoid errors. 

Machine Tabulation 

In recent years the treatment of statistical data has been greatly facili¬ 
tated by the use of machines. A special card is used for coding the original 
data of each case. One type of card, which is reproduced in Fig. 2:4, is an 

Fig. 2:4. AnI. B. M. Card 

0O|OOOOOOOOOOOOOOOO00000oooooooooooooooooooooooooooooooooooooooooooooooooooooolo 

I 2 1 4 I • I I liailUt3MII I112 II 1121 21 21 23 24 21212? 21 2llt»32J3Mll>3l2lll4«4l 42 41 44IS4l 47 4l4l9ISIUUMS5M}?9IHMIII2nMMMn«lll|Tinn}4niin)l»B 

i|n 11 n 11 n 111111111 n 1111 n 1 n 111111111111111111111111111111111111 n 11 n n|n 1 
12222222222222222222222222222222222222222222222222222222222222222222222222222222 
33333333333333333333333333333333323333333333323333332333333333333333333333033033 
44444444444444444444444444444444444444444444444444444444444444444444444444444444 
S5S555SS555S5S055S555SS55S55555S25SSSS9S5SS5SSSS52S55555555S5S5SS5SSS55555555SS5 
88088086681806066666668660668000600680000806060606666065666666658666666666066020 
77777777777777777777777777777777777777777777777777777777777777777777777777777771 
6 6 66 6 8 6 6 6 6 66 6 8 8 8 I 6 8 6 6 8 8 I 6 66 I 66666 6001 666086066686 I 68 I 8 8 6 0 8 86 86 8 8 0 0 6 6 6 6 0 0 8 8 8 8 6 6 0| 
99999999999699999999999999989990999998969999989999999999999999099999099999999199 

12 14 112 11 IOIII|lll4l)IIUIIII2l2l22»24»2in»2IB]ll2U14»X}2]lll«l4l41«44 4l4l«4|4|ll>IU»l4»9t»9IHHIIt2UMI»«l?llllll2l 22 2l 24 2S2i212lll|t 

IBM 6081 

I.B.M. (International Business Machine) card, which will take the data of 
many attributes or characters, whether they are variables or non-variables. 
Inspection of the card reveals 80 columns (numbered at the bottom) and 
10 rows. Each column thus has a total of 10 positions numbered from 0 
through 9. 

The information in Table 2:5 can readily be coded and punched on such 
cards, and a machine called an Analyzer can quickly bring together the total 
number of students in each category, as well as the total number in any 
combinaiion of categories that may be desired. The process of coding such a 
card will be briefly described for the three attributes in Table 2:5. The first 
column of the I.B.M. card will be used for coding college class. Inasmuch 
as there were four categories for this attribute, four of the ten positions of 
Column 1 will be needed. The 0 position can be used for coding those indi¬ 
viduals who were Upper Senior students; the No. 1 position for those who 
were Lower Seniors; No. 2, for Upper Juniors; and No. 3 for Lower Juniors. 
The degree groups can then be coded in the second column of the cai'd. Six 




METHODS FOR TREATMENT OF ORIGINAL DATA 


39 


positions will be necessary, 0 to 5. Similarly, ROTC membership and non¬ 
membership can be coded in two positions in the third column. Technically 
only one position is needed to code a dichotomous attribute. Thus, those 
students who were members of the ROTC could be coded at 0, and those who 
were not members would not need to be coded at all (no entry). However, 
if a dichotomous attribute is coded by the use of only one position on a card, 
care is necessary to make sure that, there are no cases for which information is 
not available. If this is not checked, all such instances would be counted with 
that group for which the code was “no entry.” 

Often the code number of each case is written at the top of the I.B.M. card; 
however, it may readily be punched. Since there were 2035 students in the 
population of upperclassmen, the students’ case numbers will require four 
columns (Columns 77-80). Case No. 1 is punclu^d as 0001, No. 10 as 0010, 
etc. Fig. 2:4 shows the data for Student No. 1908 coded as follows: 


Column 1, Position 2: Upper Junior (College Class) 
Column 2, Position 1: Social Science (Degree Group) 
Column 3, Position 0: Yes (ROTC Membership) 
Column 77, Position 1:1 


Column 78, Position 9: 
Column 79, Position 0: 
Column 80, Position 8: 


Case No. 1908 


Machine tabulation and analysis are particularly useful when there are 
many cases in an investigation and many attributes or characters to be 
studied. The quantitative data of variables can readily be coded by the use of 
two, three, or four or more columns, depending upon the nature of the quanti¬ 
tative data obtained. If, for example, the scores of individuals on a psychology 
test were to be coded and the scores ranged from 1 to 99, only two columns 
on the I.B.M. card would be needed. A score of 96 would be represented by 
a slot at the No. 9 position in the first of a pair of columns, and at No. 6 
position in the adjoining column. 


The Findex System of Coding and Analysis 

Machines for the analysis of card coded data are expensive and frequently 
not accessible to the research worker. Fortunately, there arc a'^ailable on the 
market several semi-machine methods for recording and analyzing data 
which are not so expensive. One of them is the Findex System. This consists 
of a special code card and a cabinet with special file drawers into which 
selecting rods can be inserted and the data of any attribute or character or 
combination thereof readily brought together. The method is semi-automatic, 
in that the individual code cards do not have to be hand-sorted and the 
selection of the appropriate data from a set of cards is done not by the eye 
but by a selecting rod. 



40 THE REDUCTION AND ORGANIZATION OF CATEGORICAL DATA 


A Findex card is illustrated in Fig. 2:5. The two slots on each side of the 
card are for guide rods which keep the cards in alignment in the file drawer. 
This particular card is the largest one manufactured by the Findex Company 
and has a total of 182 positions, there being 14 columns and 13 rows. As already 

Fig. 2:5. The Findex Card 




yarnjua 



— 


—- 

- - 

— 




_ 













^_ 1 “ 

• 

# 

% 

• 

• 

# 

# 

# 

# 

• 

• 

• 

# 

• 

S 

a 

S 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

3 

• 

• 

• 

• 

• 

• 

• 

• 

# 

# 

• 

• 

# 

• 

= 


2 

2 

2 

2 

9 

5 

2 

a 

< 

a 

a 

8 

• 

• 

• 

• 

• 

• 

# 

# 

• 

# 

• 

• 

• 

• 

s 

8 

8 

8 

8 

2 

8 

S 

8 

2 

5 

a 

s 

3 

• 

• 

• 

• 

• 

• 

# 

• 

• 

• 

• 

• 

• 

• 

s 

8 

8 

2 

8 

8 

S! 

8 

8 

8 

5 

2 

S 

i 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

s 

8 

8 

2 

8 

8 

8 

8 

8 

8 

< 

S 

3 

3 

i • 

# 

• 

• 

• 

# 

• 

# 

• 

• 

• 

• 

# 

• 1 

1 


s 

JJ 


a 

a 

a 

8 

S 

< 


a 

^ 1 

i # 

• 

# 

• 

• 

# 

• 

• 

• 

• 

• 

• 

# 

• i 

8 

8 

8 

2 

S 

8 

8 

8 

8 

2 

8 

a 

3 

I 


• 

# 

# 

# 

# 

• 

# 

• 

• 

• 

• 

# 

# 

s 

8 

8 

2 

s 

8 

8 

8 

8 

8 

5 

a 

a 

1 

• 

• 

# 

• 

• 

• 

• 

# 

# 

• 

• 

• 

• 

% 


8 

8 

2 


8 


8 

8 

8 

4 


3 

8 

• 

• 

• 

• 

# 

• 

• 

# 

• 

• 

• 

• 

• 

• 

5 

8 

8 

2 

8 

a 


a 

a 

8 



a 

a 

1 

• 

• 

# 

# 

# 

# 

# 

# 

• 

• 

# 

# 

# 

1 

8 

a 

a 

8 

a 

8 

a 

a 

8 



a 

8 

1 

i 

# 

• 

# 

• 

• 

# 

• 

• 

• 

• 

• 

• 

5 

1 

r> 

5 

2 

2 

8 

2 

2 

a 

2 

a 

a 

8 

1 • 

1 

1 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 1 

1 > 


1 

• 

•p 



» 


9 

S 

a 

a 

^ 1 

1 • 

• 

1 

f 

f 

f 

• 

• 

f 

f 

• 

• 

• 

• 1 


indicated, one position on a card code will serve to record the data of a 
dichotomous attribute or a variable which has been dichotomized. Further¬ 
more, combinations of positions can be used to provide many more than the 
182 simple positions on this card. 

The method used with the Findex System is as follows: Each statistical 
datum is represented by a position on the card. An entry at a position is 
coded by a slot-cutting device. The slots in the card in Fig. 2:5 represent the 
same data for Student No. 1908 as does the I.B.M. card in Fig. 2:4. The 
bottom row, position Nos. 1 to 4, has been used for College Classes; in the 













METHODS FOR TREATMENT OF ORIGINAL DATA 


41 


row above this, position Nos. 11 to 16 represent Degree Groups; and position 
No. 21 of the third row from the bottom has been used for ROTC member¬ 
ship. Thus, for Student No. 1908: 

Position No. 3: Upper Junior 

Position No. 12: Social Science 

Position No. 21: ROTC Membership 

“No entry” for No. 21 may be used to signify non-membership in the ROTC, 
provided there are no cases in the total group about wliom the investigator 
does not have the relevant information. If tiiere are such instances, then 
position No. 22 would be used to signify non-membership in the ROTC. 

If the investigator wishes to determine how many members of the group 
are Arts students, he places the punched cards in the file drawer and inserts a 
rod at the No. 11 position. The cards are locked in the drawer by rods in¬ 
serted through the four slots in the margins of the card. He then tilts the drawer 
to an upside-down position and ruffles the cards, i.e., separates them slightly 
from each other by running his fingers along the sides of the cards. All the 
cards on which an entry has been made at position 11 will drop about half an 
inch (the distance of the slot). A lock rod is then inserted through one of the 
holes at the bottom of the card and the drawer is returned to its original 
position. The lock rod prevents the cards which dropped from returning to 
their original position. Thus they can be quickly counted to give the total 
number of Arts students. 

As with the I.B.M. cards, the data of a Findex card can also be cross- 
tabulated. This is done by the simultaneous insertion of selecting rods for the 
two or more characters which are to be studied. Thus, the total number of 
Arts students who are Upper Seniors and members of the ROTC can readily 
be determined by inserting rods in positions 1, 11, and 21 and repeating the 
operations just described. Only those cards with entries in each of the three 
positions will drop. 

An advantage of the Findex System over a straight machine method is that 
the research worker can keep in closer touch with his original data and can 
examine the details of a case record at any time. For sizable groups of cas(\s, 
the method is of course slower than machine analysis, but it is more rapid 
than hand-sorting, especially in the cross-tabulation of two or more attributes. 

EXERCISES 

1. Give five examples, not mentioned in the chapter, of dichotomous non-variable 
attributes. 

2. Give five examples, not mentioned in the chapter, of polytomous non-variable 
attributes. 

3. State the differences between Type I and Type II DA’s. 

a. Give five examples, not mentioned in the chapter, of Type I DA’s. 

b. Give five examples, not mentioned in the chapter, of Type II DA's. 



42 THE REDUCTION AND ORGANIZATION OF CATEGORICAL DATA 


4. Set up a stratification for the student body of a college or university in terms of 
attributes or characteristics other than the three used in the example from the 
Dreyfuss opinion poll, and state the relevance of each attribute selected to a re¬ 
search problem on the attitudes or opinions of the student body. 

5. Outline a research problem on the opinions or attitudes of the members of your 
community and select three attributes for stratification which might possibly be 
relevant to the research problem outlined. 



CHAPTER 3 


The Comparison of Categorical Data: Propor¬ 
tions^ Percentages, Ratios, Index Numbers 

A. RATIOS AND PERCENTAGES 

The raw or original data of non-variable atiribules are reduced to a form 
more manageable for interpretation by means of tabulations and enumera¬ 
tions into appropriate categories. Such reductions yield statistical frequencies, 
described in the preceding chapter. The comparison of two or more sets of 
categorical data is further facilitated by the reduction of enumerated values 
to appropriate proportions. 

The most commonly used proportion is the percentage (from per centum), 
which is a proportion taken to a base of 100 . That is, a p(*rcentage is a pro¬ 
portion multiplied by 100. Percentages are employed more generally than any 
other type of proportion for the comp 6 u*ison of categorical data. When per¬ 
centage values for a given type of data repeatedly occur as very small values, 
the basic proportions involved are often taken to other bases, as to 1000 
(per mille), or 100 , 000 , or 1 , 000 , 000 . 

Since percentage values, per mille values, etc., are always derived from a 
proportion, the development of the latter will be considered first. 

Proportions (p) 

A proportion is a ratio of two numbers, such as the ratio of a part to the 
whole. This ratio is usually expressed in decimals. Thus, the ratio ^ has a 
p value of .50. This may also be written as .5, but proportions are usually 
expressed to at least two decimal places. 

The value obtained in computing a proportion always depends on the 
number taken for the base. Proportions are often misleading because they 
have not been taken to the base implied in their interpretation. Thus, in the 
comparison of the number of ROTC-Upper Seniors with the number of 
ROTC-Lower Juniors (Table 2:5), the statistical frequencies were found to be 
61 and 138.* 

Will a ratio of 61 to 138 give us the desired proportion for comparing these 
two grou ps? P ^ 61/138 = 44 

♦ For brevity throughout this discussion, the attribute ROTC will be employed to 
indicate, as previously, those students with past as well as present membership in the 
college ROTC unit. 


43 




44 


THE COMPARISON OF CATEGORICAL DATA 


Before answering this question, let us see just what the value .44 signifies. 
Aside from the fact that it means there are somewhat less than half as many 
persons in the first group as in the second, the value .44 expresses the mathe¬ 
matical fact that 

61:138 :: .44 :1.00 

Sixty-one is to 138 as .44 is to 1.00. In other words, a proportion always 
signifies the ratio of a number taken to a base of 1. If the comparison is turned 
around and the proportion of ROTC-Lower Juniors to ROTC-Upper Seniors 
is obtained, then 

p = 138/61 = 2.26 

This means that there are more than twice as many persons in what is now 
taken as the first group than there are in the second group. Mathematically, 
the proportion 2.26 means that 138 is to 61 as 2.26 is to 1. Thus, in the Lower 
Junior Class there were 2.26 ROTC members for each one ROTC member in 
the Upper Senior Class. 

Whichever way the comparison is made by means of a proportion, it gives 
the ratio of the first number to the second number taken to a base of 1. 

Absolute vs. Relative Comparisons 

Returning to the question of whether .44 gives us the proportion desired 
for comparing the two ROTC groups, we find that the answer depends upon 
the purpose of the comparison. The use of the proportion 2.26 rather than .44 
does not alter the basic facts of the comparison; it merely shifts the direction 
of the comparison. That is, 2.26 indicates that there were more than twice 
as many ROTC-Lower Juniors as ROTC-Upper Seniors, whereas .44 indi¬ 
cates that there were less than half as many ROTC-Upper Seniors as ROTC- 
Lower Juniors. 

From the point of view of the number of uniforms and other facilities to 
be provided, this absolute increase in ROTC membership in the Lower Junior 
Class has its significance, independently of what is happening to the size of 
each Class as a whole. The proportions .44 and 2.26 compare the absolute 
differences in the sizes of the two ROTC groups, independently of the Classes 
of which each is a part. For most purposes, the comparison of the absolute 
differences would be just as usefully made in terms of the numbers them¬ 
selves as in terms of proportions derived from them. 

If, on the other hand, it is relevant to compare these two groups in relation 
to the respective Classes of which they are parts, the procedure for calcula¬ 
tion will be different. If the purpose is to determine whether there was a 
relative increase as well as an absolute gain in ROTC membership, it will be 
necessary to take into account the difference in the sizes of the Upper Senior 
and Lower Junior Classes. This is done by obtaining: 

1 . The proportion of ROTC-Upper Seniors to all Upper Seniors. 

2. The proportion of ROTC-Lower Juniors to all Lower Juniors. 



RATIOS AND PERCENTAGES 


45 


3. The comparison of the resulling proportions, cither directly or by means 
of a proportion. 


Thus: 


,,, ROTC-U.Sr. “61 
- AiHJ.S,. - 479 - 
p = .1.3 (a ratio of approx. 1 in 8) 


( 2 ) 


ROTC-L..Ir. 1.38 „„ 

All L.Jr. 623 

p = .22 (a ratio of approx. 1 in \\) 


Thus, of all Upper Seniors .13, or approximately 1 in 8, were in the ROTC. 
Of all Lower Juniors, .22, or approximately 1 in 4^, were in the ROTC. 
However, let us make this comparison of the two groups by an additional 
proportion: 

.13 22 


If ROTC-Upper Seniors are compared with ROTC-Lower Juniors, relative 
to the size of their respective Classes, there were about three-fifths (.59) as 
many ROTC members in the Upper Senior Class as in the Lower Junior Class. 
Or if ROTC-Lower Juniors are compared with ROTC-Upper Seniors, there 
were (again relative to the size of each Class) more than as many ROTC 
members in the Lower Junior Class as in the Upper Senior Class. 

It is evident, then, that the relative difference—relative to the size of their 
respective Classes as a whole—between the two ROTC groups was not as 
great as the absolute difference. Thus: 

Ratio of Absolute Differem^e: .44 and 2.26 
Ratio of Relative Difference: .59 and 1.69 


An Absolute Difference, with the Absence of a Relative Difference 

An absolute comparison will at times signify a difference—a gain or a 
decrease—whereas, relatively, there is no difference. Thus, Table 2:5 showed 
33 ROTC men who were Science Degree-Lower Seniors, and 38 ROTC men 
who were Science Degree-Lower Juniors. Absolutely, there is a difference of 
5—an increase of 5 ROTC men in the Science Degree-Lower Junior group. 

However, relative to the size of their respective groups (Science-Lower 
Senior and Science-Lower Junior), there was no change. Thus: 


ROTC-Science-L.Sr. 

33 

33 

All Science-L.Sr. 

” 33 + 140 ” 

173 

ROTC-Science-L..Tr. 

38 

38 

All Science-L.Jr. 

” 38 + 161 “ 

199 


The proportion of those in each group who were ROTC men therefore remains 
the same when considered in relation to each Science Degree group as a whole. 



46 


THE COMPARISON OF CATEGORICAL DATA 


An Absolute Increase but. Relatively, a Decrease 

It is also possible for there to be an absolute increase in the size of two 
groups, and at the same time a relative decrease. Thus, if the ROTC member¬ 
ship of Science Degree-Upper Juniors is compared with that of the Science 
Degree-Lower Juniors on the basis of the data in Table 2:5, the following 
results are obtained. There were 33 ROTC members in the Science Degree- 
Upper Junior Class and 38 ROTC members in the Science Degree-Lower 
Junior Class. In absolute terms, there was an increase of 5 members. But, 
relative to their respective groups as a whole: 


ROTC-Science-U.Jr. 

33 

33 

All Science-U.Jr. 

“ 33 -h 119 

” 152 

ROTC-Science-L.Jr. 

38 

38 

All Scienc^c-l..Jr. 

~ 38 -f 161 

199 


There was therefore a relative decrease in ROTC membership among the 
Science Degree-Lower Juniors as compared with the Science Degree-Upper 
Juniors. The former group contained 

.19/.22 = .86 of an ROTC member 

to each ROTC member in the Science Degree-Upper Junior group. 


The ROTC Group Taken as a Whole 

However, we Iiave by no means exhausted the comparisons that can be 
made of the data in Table 2:5. For example, if we wish to consider ROTC 
upperclassmen as a whole and ascertain the proportion who are Seniors, as 
compared with Juniors, we use the following procedure: 


Number of ROTC upperclassmen = 61 -h 88 -h 104 -f- 138 = 391 
Number of ROTC Seniors = 61 -f- 88 = 149 

Number of ROTC Juniors = 104 -|- 138 = 242 


Proportion of ROTC Seniors: 
Proportion of ROTC Juniors: 


p = 149/391 = .38 
p = 242/391 = .62 


Check: The sum of the ratios of all parts of a whole should equal 1.00. 

.38 -h .62 = 1.00 


Thus, nearly two-fifths (.38) of all ROTC upperclassmen were Seniors, whereas 
about three-fifths (.62) were Juniors. With respect to each other, then, the 
proportion of Junior Class-ROTC Members to Senior Class-ROTC Members 
was better than 1^ to 1. 

.62 : .38 :: 1.64 : 1.0 
since .62/.38 = 1.64 



RATIOS AND PERCENTAGES 


47 


Rounding Off Numbers 

The proportions for the comparisons in the prticeding section have been 
given to two decimal places. The actual arithmetical operations can, of 
course, be carried much further. However, when the smallest groups involved 
have proportions of .01 or greater, ratios to two (or three) decimal places 
are adequate for ordinary comparative purposes. 

A standardized procedure to be used in determining the value of the last 
figure of the decimal when there is a remainder will now be described. The 
general rules are as follows: 

Rule 1: If, in division, the remaindt^r is less than one-half the number value 
of the divisor, the value of the last obtained digit of the quotient is un¬ 
changed. Thus, 

.22 1.1900 
176 
140 
1 ^ 

8 (remainder) 

Since 8, the remainder, is less than one-half of 22 (the divisor) the quotient 
remains .86. 

Another way to formulate this rule for rounding off numbers is as follows: 

If I he digit to b(^ dropped from the quotient is less than 5, the preceding 
digit is unchanged. Thus: 

.19/.22 = .863 = .86 

Rule 2: If, in division, the remainder is more I fian one-half the number value 
of the divisor, the value of the last obtained digit of the quotient is increased 
by 1. Thus: 

.21 

152. 1 33.0 ' 

30 4 
2 60 
1 52 

1 08 (remainder) 

Since 108 is more than one-half the value of 152, the quotient is written 
as .22. 

This rule can also be reformulated as follows: 

If the digit to be dropped from the quotient is more than 5, the preceding 
digit is increased by 1. Thus: 

33./152. = .217 = .22 

Rule 3: If, in division, the remainder is equal to exactly one-half the number 
value of the divisor, 

(a) Leave the value of the last obtained digit of the quotient unchanged, 
if the digit is even. 



48 


THE COMPARISON OF CATEGORICAL DATA 


(b) Increase the value of the last digit of the quotient by 1, if the digit is 
odd. 

Thus, for (a): 

3.4 

.I2\ld4 

36 

54 

£8 

6 (remainder) 

Since 6, the remainder, is exactly one-half of 12, and since the value of the 
last obtained digit of the quotient is even (4), it is unchanged. 

And for (b): 

.55 = .56 
.70 1 .3885 
350 
385 
350 

35 (remainder) 

Since 35, the remainder, is exactly one-half of 70, and since the value of the 
last obtained digit of the quotient is odd (5), the quotient becomes .56. 

This rule for rounding off numbers can be reformulated more simply as 
follows: 

If the digit to be dropped from a quotient is exactly 5 (followed only by 
zeros), make the preceding digit even. Thus: 

.525000000 remains .52; .53500 becomes .54 

This third rule is to be contrasted with the lay practice of always increasing 
the preceding digit by 1 when a 5 is dropped from the quotient. That the 
latter makes for a cumulative error is obvious from the following example: 


Division: Even 

Rounding Off to Two Decimals 

to 3 Decimals 

Correct Method 

Incorrect Method 

.265 

.26 

.27 

.105 

.10 

.11 

.085 

.08 

.09 

.205 

.20 

.21 

.155 

.16 

.16 

.145 

.14 

.15 

.005 

.00 

.01 

.035 

.04 

.04 

S = 1.000 

.98 

1.04 


The preceding example also illustrates the principle that the sum of the 
proportions of all the parts of a whole should be equal to unity. The method 
used in the last column is less accurate for rounding off the numbers to two 
decimal places than is the method used in the middle column. It should be 



USE OF PERCENTAGES 


49 


noted that with the recommended (“correct”) method the sum of the eight 
parts is not exactly equal to unity. This is because there are more even than 
odd numbers in the hundredth column of digits. Except in such a circum¬ 
stance, the sum of the proportions of all parts of a whole should, of course, 
exactly equal 1.0. 

Dropping More Than One Digit 

If in rounding off a number, several digits are to be dropped, the pre¬ 
ceding rules still apply. The following numbers are rounded off to two decimal 
places, as indicated: 

.47896 becomes .48 
.47396 remains .47 

.45550 becomes .46 
.44550 Uicomes .45 

.44500 remains .44 
.44450 n^mains .41 

.43500 becomes .44 
.43499 remains .43 

B, USE OF PERCENTAGES FOR COMPARING THE PARTS OF 
TWO OR MORE WHOLES 

Wc have seen that the sum of the proportions of all parts (categories or 
classes) of a given whole should equal 1.0, or unity. The comparison of the 
parts of two or more wholes by proportions means that the data of the parts 
are reduced to a common base of 1,0, However, in practice, percentage values 
are more frequently used than proportions for comparing the composition of 
two or more groups. As we have seen, a percentage value is simply a propor¬ 
tion multiplied by 100. In other words, a percentage is a proportion taken to a 
base of 100 instead of to a base of 1.0. 

The question sometimes arises as to which “wholes” are to be compared 
and, therefore, what value is to be used in determining a base. Consider, for 
example, the data in Table 2:5 as presented in Table 3:1, with the attribute 
of ROTC membership omitted.* 

The absolute number of students for each combination of College Class 
and Degree Group is given in Table 3:1, and the percentage of students of 
each College Class in each Degree Group is shown in Table 3:2. The latter 
therefore compares the proportion of different Degree students in eacli 
College Class. Whereas 10.2% of all Upper Seniors were Arts Degree students, 
only 8.3% of all Lower Juniors were Arts Degree students, etc. 

* It is to be noted that, whereas Table 3:1 can readily be derived from Table 2:5 as a 
condensation of the latter, the reverse is obviously not possible. The advantage of a cora- 
ph^te work table in classifying the original data of all attributes in an investigation thus 
iies in the time and labor saved in not having to go back to the individual record cards for 
the data of each comparison to be made. 



50 


THE COMPARISON OF CATEGORICAL DATA 


Table 3:1. Classification of 2035 Students by College Classes 
and Degree Groups 


Degree Groups 

College Classes 

Totals 

Upper Senior 

Lower Senior 

Upper Junior 

Lower Junior 

Arts 

49 

47 

mm 


188 

Social Science 

107 

135 



541 

Science 

155 

173 



679 

Business Adm. 

3 

0 

0 


8 

Technology 

153 

114 

125 


574 

Education 

12 

14 

8 


45 

Totals 

479 

483 

450 


2035 


Table 3:2. Comparison of Four College Classes for Differences in Relative 
Size of Respective Degree Subdivisions 
(Each College Class Taken to a Base of 100 Per Cent) 


Degree Groups 


College 

Classes 


Upper Senior 

Lower Senior 

Upper Junior 

Lower Junior 

Arts 

10.2% 

9.7% 

8.9% 

8.3% 

Social Science 

22.3 

28.0 

27.8 

27.9 

Science 

32.4 

35.8 

33.8 

31.9 

Business Adm. 

0.6 

0 

0 

0.8 

Technology 

31.9 

23.6 

27.8 

29.2 

Education 

2.5 

2.9 

1.8 

1.8 

Totals 

99.9% 

100.0% 

100.1% 

99.9% 

N = 

[479] 

(483) 

14501 

[6231 


If, on the other hand, it is relevant to compare the degree differences of 
the four class groups, the tabulation and percentages will be as indicated in 
Tables 3:3 and 3:4. Thus, according to the latter table, 26.1% of all Arts 
Degree upperclassmen were Upper Seniors, whereas only 19.8% of all Social 
Science Degree upperclassmen were Upper Seniors, etc. 


Table 3:3. Classification of 2035 Students by Degree Groups and 
College Classes 


Classes 

Degree Groups 

Arts 

Soc. Sci. 

Science 

Bus. Adm. 

Tech. 

Educ. 

Upper Senior 

49 


155 

3 

153 


Lower Senior 

47 


173 

0 

114 


Upper Junior 

40 


152 

0 

125 


Lower Junior 

52 

174 

199 

5 

182 

11 

Totals 

188 

541 

679 

8 

574 

45 

























































USE OF PERCENTAGES 


51 


Table 3:4. Comparison of Six Degree Groups for Differences in the 
Relative Size of Respective College-Class Subdivisions 

(Each Degree Group Taken to a Base of 100 Per Cent) 


Classes 

Degree Groups 

Arts 

Soc. Sci. 

Science 

Bus. Adm. 

Tech. 

Educ. 

Upper Senior 

26.1% 

19.8% 

22.8% 

37.5% 

26.7% 

26.7% 

Lower Senior 

25.0 

25.0 

25.5 

0 

19.9 

31.1 

Upper Junior 

21.3 

23.1 

22.4 

0 

21.8 

17.8 

Lower Junior 

27.7 

32.2 

29.3 

62.5 

31.7 

24.4 

Totals 

100.1% 

100.1% 

100.0% 

100.0% 

100.1% 

100.0% 


Although the decision as to which attribute is to be laid ofT horizontally 
and which vertically is somewhat arbitrary, the arrangement in Tables 3:2 
and 3:4 is the type ordinarily used. In both tables the wholes are subdivided 
into proportionate parts within the columns ratlier than in the rows. There¬ 
fore, the tables can be read horizontally (by rows) in comparing the propor¬ 
tions of members of each strat um in a given category of the substratum. Thus, 
it is readily obs('rved from Table 3:2 that there is a constant decrease in the 
relative size of the Arts Degree group, beginning with the Upper Senior Class 
and going on across to the Lower Junior Class. Furthermore, the columns of 
this table give the proportionate composition by Degree groups of each 
College Class as a whol(\ 

Sometimes the statistical frequency (symbolized by N, the number of cases) 
of (^ach cell is included in tabulations like Tables 3:2 and 3:4. However, if 
the A^’s for each cell are not included, the total N's used for the bases of each 
whole should always be stated, so that the base for each pcrc(mtage of a table 
will be clear. Moreover, if the base N is omitted and the reader is accordingly 
not informed of the sizes of groups being compared, he is likely to make 
absurd interpretations. In Table 3:4, for example, 62.5% of the Business 
Administration students were Lower Juniors, whereas the highest proportion 
of Lower Juniors in any other Degree Group was 32.2% (Social Science). But 
the total N for the Business Administration group was only 8 ; hence the per¬ 
centage values for this Degree Group are more likely to be misleading than 
useful. In fact, percentage values derived from wholes that are composed of 
much less than 100 cases should always be interpreted with caution. 

There is one further difference in the cross-tabulated layout of the two 
attributes, college classes and degree groups, in the preceding sets of 
tables which should be noted. The horizontal arrangement of College Classes 
in Tables 3:1 and 3:2 is in a sequence that has a logical order. That is, in going 
from Upper Senior to Lower Senior to Upper Junior to Lower Junior, the 
progression is from the college group with the most degree credits to the 
group with least. The order, then, hfrom most to least. “Concealed” within 























52 


THE COMPARISON OF CATEGORICAL DATA 


these four categories of College Classes is a variable attribute, a variable for 
which the individual instances differ quantitatively in the number of degree 
credits per student at a given time. This variable, however, is ordinarily 
divided for convenience into a few categories and the data treated as if 
categorical. 

The horizontal arrangement of Degree Groups in Tables 3:3 and 3:4, on 
the other hand, has no such logical order. The actual order employed is one 
corresponding in the main to the number notation for each degree used by 
the Registrar’s Office; it is an order with a history but with no inherent 
logical structure. Therefore, for purposes of comparison, any other arrange¬ 
ment of the six categories of Degree Groups might have been used. For 
example, it might perhaps have been more useful to arrange these six cate¬ 
gories in the order of the total number of frequencies in each. If this were 
done, the order would be: 

Sci. Tech. Soc. Sci. Arts Educ. Bus. Adm. 

N = 679 574 541 188 45 8 

In any event, it should be clear that the arrangement of the four categories 
of College Classes corresponds to a scale or continuum characteristic of a 
variable, whereas in the arrangement of Degree categories there is no such 
inherent order ranging from most to least. The Degree subdivisions are cate¬ 
gories of a non-variable attribute. 


C- RATIOS AND INDEX NUMBERS 


Per Capita Indices 

Ratios are commonly used to bring together data in such a way that they 
can be readily compared by the use of a common “yardstick.” For examplci, 
comparison of the costs of education in a school system from year to year, or 
among several school systems or among the states of the Union, can readily 
be made by a ratio that gives the cost per capita. In fact, per capita cost 
exemplifies one of the most common ratio techniques for indexing statistical 
information. Per capita cost is obtained by taking the ratio of the total cost 
to the number of individuals represented in the total cost. Thus, if a school 
system costs $115,000 a year to operate and the average daily attendance is 
1000 pupils, the per capita cost is $115: 


Total educational cost _ $115,000 
Number of pupils 1000 


$115 per capita cost 


Table 3:5 summarizes the per capita cost of education for several suburban 
school systems near New York City, as reported for the school year of 1942- 


1943. 



RATIOS AND INDEX NUMBERS 


53 


Table 3:5. Comparative Costs of Several School Systems 
(Index: Per Capita Cost for 1942-1943*) 


School System 

Total Educational Costs 

Number of Pupils 
(Average Daily 
Attendance) 

Per Capita Cost 

Mount Vernon 

$2,397,336.81 

9202 

$260.52 

Momoroneck 

1,013,516.36 

3177 

319.02 

Great Neck 

996,866.07 

2980 

334.52 

Scarsdale 

866,881.09 

2072 

418.38 

Port Chester 

756,407.93 

3743 

202.09 

Garden City 

763,985.23 

1803 

423.73 

Pelham 

627,975.04 

1643 

382.21 

Manhasset 

554,650.16 

1706 

325.12 

Bronxville 

483,624.96 

1159 

417.28 

Peekskill 

443,538.40 

2205 

201.15 


It is apparent, from the data in Table 3:5, tliat the fairer way of comparing 
the educational costs of two school systems is in terms of per capita cost 
(last column), rather than absolute cost (Total Educational Costs, second 
column). The Mount Vernon School System cost the most, but it served 
over 9000 students and consequently ranked seventh of the ten systems in 
per capita cost. On the other hand, the Bronxville System ranked ninth in 
total cost, but served only 1159 students and was third in per capita cost. 

Per capita ratios provide indices useful for the comparison not only of 
ndative costs but also of the relative incidence of many social phenomena, 
such as crimes, diseases, various social services, etc. 


Ratios as Index Numbers 

Ratios presenting per capita costs are in reality index numbers; that is, 
they are values that not only index cost but index it with respect to a yard¬ 
stick that is useful for the comparison of costs in two localities. Essentially, 
this is the purpose of index numbers, whether they are obtained in educa¬ 
tional statistics or in economic statistics or other fields. 

The main problem in the development of an index number is determining 
what the base shall be. Even in per capita costs, this problem is not always 
simple. It might appear to be a problem merely of counting the number of 
cases in the group being studied. However, the per capita costs of education 
cited in Table 3:5 were based on the average daily attendance rather than on 
the total enrollment. In order to get such a base—namely, average daily 
attendance—systematic records have to be kept by each classroom teacher 
during the entire school year. When two or more indexes are being directly 

* Vernon G. Smith, Cost Study and Salary Study: Cities and Villages of the Metropolitan 
Area, Scarsdale, New York, 1943. Cf. also. Fortieth Annual Report of the State Education 
Department of New York, 1945, vol. 2, pp. 76-79, 148-151. 









54 THE COMPARISON OF CATEGORICAL DATA 

compared, it is obviously essential that they all be computed to the same 
base. 


The LQ, Index 

A commonly used ratio, that yields an index of intellectual maturity, is 
the intelligence quotient, or LQ. In the Stanford-Binet and other intelligence 
tests, this ratio is taken as equal to: 

Mental age — j q 
Chronological age 

An individual whose I.Q. is 1.00 is therefore one whose mental age (as derived 
from the Binet tests) is equal to his chronological age. If a child 8 years old 
has a Binet mental age of 10 years and 6 months, his LQ. index is (years 
being converted to months): 

10(12) + 6 _ 126 _ _ 

8(12) 96 

On the other hand, if an 8-year-old child has a Binet mental age of 6 J years, 
his I.Q. is: 

6(12) + 6 ^ Z? ^ o, 

8(12) 96 ' 

The I.Q. index is often multiplied by 100 to eliminate the decimal in the ratio. 
The above index numbers would then be 100 (average intelligence), 131 
(above average intelligence), and 81 (below average intelligence). 


Standard Scores 


A widely used index of relative ability is the Standard score (see Chapter 8). 
It is initially the ratio of a person’s performance on a test (expressed as the 
difference between his test score and the arithmetic mean of the group) to a 
standard measure of the variability of the test taken in terms of a measure 
known as the standard deviation of the test scores of the group and symbolized 
by the Greek letter a. This ratio is then converted to a scale whose mean 
equals 5.0 and whose standard deviation equals 1.0. Thus: 

S (Standard score) = — -^ 4- 5.0 


If a person’s score on an ability test is one standard deviation above the mean 
of the group, his Standard score index is therefore 6.0. 


If 

then 

and 


(A - M,) = (T: 
X-M, 


= 1.0 


1.0 + 5.0 = 6.0 


Index Numbers as Percentages 

Many index numbers are expressed in terms of a percentage rather than as 
a proportion. The method of expression chosen is simply a matter of con- 



CONFUSION IN THE USE OF PERCENTAGES 


55 


venience in interpretation. For example, the index of the relative proportion 
of the sexes in a population, called the sex ratio, is usually expressed as a 
percentage, the percentage of males to females. This is illustrated by Table 3:6, 
containing population data for eight age groups in New York City. 


Table 3:6. Sex Ratio for New York City by Age Groups * 
(Based on Data of 1940 U.S. Census) 


Age Groups 

Males 

Females 

Proportion of 
Males to Females 

Sex Ratio 

Under 5 yrs. 

221,415 

212,479 

1.042 

104.2 

5 to 9 

238,798 

231,758 

1.030 

103.0 

10 to 14 

283,453 

277,655 

1.021 

102.1 

15 to 19 

300,717 

306,225 

.982 

98.2 

20 to 24 

304,862 

344,291 

.885 

88.5 

25 to 44 

1,302,761 

1,383,554 

.942 

94.2 

45 to 64 

836,920 

795,688 

1.052 

105.2 

65 and over 

187,367 1 

227,052 

.825 

82.5 

All ages 

3,676,293 

3,778,702 

.973 

97.3 


The sex ratio in New York City for children under 5 years of age was 
104.2 in 1940. This means that for every 100 girls there were approximately 
104 boys. On the other hand, the sex ratio was 88.5 for young men and 
women in the 20-24 age group. This means that at this age there were but 
88.5 men for each 100 women. The sex ratio for all age groups, given in the 
bottom row of the table, was 97.3, which means that for every 100 women 
there were approximately 97 men. 


D. CONFUSION IN THE USE OF PERCENTAGES 

Misinterpretations or confusion in the use of percentages is likely to occur 
unless the research worker is aware of such possibilities and is careful to avoid 
them. Some of the most common errors in using percentages are illustrated in 
the following examples. 

Confusion in Interpreting a Percentage Increase 

In a recently published study, “How Leading Questions Determine 
Answers,” the following results were reported for two field survey questions. 
Half of the respondents were asked: 

As you know, this war is costing a lot of money. Do you think that adver¬ 
tising in wartime is a necessary or unnecessary expense? 

42% said Necessary 
38% said Unnecessary 
20% said Don’t Know 


Sixteenth Census of the U.S., Population, 1943, vol. 4, pt. 3, p. 663. 













56 


THE COMPARISON OF CATEGORICAL DATA 


The alternative question asked of the other respondents was: 

Do you think advertising in wartime is necessary or unnecessary? 

64% said Necessary 
26% said Unnecessary 
10% said Don’t Know 

These results were misinterpreted as follows: “Here [for the second question] 
the per cent of answers in favor of advertising is 22% higher.” As an inspec¬ 
tion of these data shows, the difference in the percentage figures between the 
64% saying “Necessary” to the second question and the 42% saying 
“Necessary” to the first question was 22 per cent points. However, the per¬ 
centage increase is in reality the ratio of: 

UaOO) = 152.4%; and 152.4% ~ 100% = 52.4% 

This is considerably different from “22% higher.” The percentage of respond¬ 
ents in favor of advertising was 52.4% higher on the second question than on 
the first. The ratio of answers was thus about 1| to 1. 


A Percentage Decrease Can Never Be More Than 100% 

This is so obvious that little or no reflection should be necessary. Howewer, 
this misinterpretation is sometimes made. Thus, if an article regularly costs 
$1.00 and the price is increased to $2.00 and then later reduced to 90 cents, 
one might in error interpret this as a 110% decrease in price. Actually, of 
course, the decrease from the price of $2.00 is 55%. The base for the decrease 
has to be the higher cost value, not the original cost value of $1.00. 

90 

(100) = 45%; and 100% — 45% = 55% (the decrease) 


Another example of this type of misinterpretation: A person may have 
increased his accuracy score in a card-sorting test, say from an average ( f 
16 errors per trial to an average of only 8 errors per trial. This would hv. a 
gain in the accuracy score of 

x\(100) = 50% 

Let us now assume that his accuracy score reverts to 24 errors per trial with 
the experimental introduction of a “distraction.” Even though there is a 
threefold increase in errors, the decrease in accuracy is not 300% 


^4(100) = 300% 


but rather 67%: 

^(100) = 33%, and 100% - 33% = 67% 
The base for the percentage decrease is 24 instead of 8. 


Confusion Between Percentages and Proportions 

Values of less than 1% are sometimes written to two decimal places without 
a zero being placed to the left of the decimal. Thus, one-half of one per cent 



CONFUSION IN THE USE OF PERCENTAGES 


57 


is sometimes written as .50. That this may readily be confused with the 
proportion of .50 is apparent. Such confusion will be avoided if a zero is 
always placed on the left of the decimal for any figure which is a fraction of 
one per cent. Thus, one-half of one per cent should be written as 0.5%. 


Confusion from Large Percentages 

An incapacity to grasp the implications of very large percentage values is 
another frequent cause of confusion. Thus, a population of 1,000,000 people 
is 5000% as great as a population of 20,000: 


1,000,000 

20,000 


(100) = 5000% 


In such a case, it is usually better to indicate that the larger group is 50 times 
as great as the smaller, rather than 5000% as great. 


Percentages from Too Small a Base 

Misleading conclusions are likely to result from percentages computed 
from too small a base. As previously pointed out, the expression of a ratio 
as a percentage usually implies at least 100 members in the total group for 
which the proportion is computed. Examination of Table 3:4 indicates that 
62^% of the Business Administration Degree students were Lower Juniors, 
that none were Upper Juniors or Lower Seniors, and that 37^% were Upper 
Seniors. On the face of it, these appear to be astounding differences. However, 
as indicated in the table, these percentages are based on only eight cases, 
and consequently the differences are not as significant as they appear. 


Errors in Averaging Percentages 

A fallacious result follows the averaging of percentages unless the size of 
each group being averaged is taken into consideration. In other words, an 
average of percentages needs to be weighted by the respective size of each 
group from which the percentage figures are derived. Let us assume that the 
following four groups of respondents expressed a liking for a motion picture 
according to the percentages indicated for each: 

Group Percentage Liking Motion Picture 

Men 40% 

Women 60 

Boys 10 

Girls 50 

Total = 160% 

Average = = 40% 

If each of these four groups of respondents is the same size, then the average 
of 40% is correct. But if the sizes of the groups vary, we cannot be sure that 



58 THE COMPARISON OF CATEGORICAL DATA 

40% is a satisfactory average. Each group needs to be weighted by its size 
and the average computed from the weighted percentages, as follows: 


Group 

Percentage Liking 
Motion Picture 

N 

(Size of Group) 

Percentages Weighted 
by N 

Men 

40% 

1000 

40,000 

Women 

60 

1000 

60,000 

Boys 

10 

500 

5,000 

Girls 

50 

500 

25,000 

Total 

160% 

3000 

130,000 


With this information, a correct average can now be obtained to represent 
the percentage of all respondents liking the motion picture. It is as follows: 

Total percentages weighted by N*s 130.000 

Total A^’s ■ 3000 “ 

E. GRAPHIC METHODS FOR THE PRESENTATION AND 
COMPARISON OF CATEGORICAL DATA 

During the past ten years, the art of portraying statistical information has 
expanded and flourished. That one graph or picture of a statistical result is 
“worth a thousand words,” as the saying goes, serves to emphasize the 
psychological value of a picture that drives home through the immediate 
comprehension of the eye a set of otherwise dull or uninteresting statistical 
tabulations. The art of graphic representation has penetrated our daily press, 
and magazines and pamphlets for the lay public. Today children in the 
secondary schools—even in some elementary schools—not only are familiar 
with statistical portraits but also make them. Particularly popular are graph 
and pictorial techniques for categorical data. 

The purpose of a graph or chart is to tell a story in a simple and intelligible 
but striking manner. So far as possible, the story should be complete. That is, 
the reader should need to have little recourse to text material or to a table of 
statistical information in order to understand the meaning of a graph. A 
graph should therefore have a descriptive title and the details should be 
adequately labeled so that the statistical results to be conveyed the reader 
can be understood immediately. 

The chief types of graphs or charts for categorical data are (1) bar charts, 
(2) belt graphs, (3) pie diagrams, (4) maps, and (5) pictorial charts. In addi¬ 
tion, line graphs and belt graphs are used to portray trends in categorical 
data over a period of time. With respect to the passage of time, the data of a 
non-variable attribute per se may thus become a variable, as for example 
the changes from year to year in the number or proportions of the people in 
a geographical cu^ea engaged in various occupations, etc. Learning curves are 
examples of time series, for which line graphs are usually employed. 
















GRAPHIC METHODS FOR CATEGORICAL DATA 


59 


Bar Graphs 

The simplest but not always the most interesting type of graph for 
categorical data is the bar graph. The bars may be drawn vertically or hori¬ 
zontally. Their length is scaled according to the number of frequencies or 
percentages of the categories to be shown. Their width is mainly a matter 
of aesthetics; that is, the determination is governed by what pleases the eye 
rather than by any logical rule other than uniformity of width for the several 
bars of a given chart. 

Fig. 3:1. Comparison, by Age Groups, of Pre-School Children's Types of Verbal 
Responses to Pictures (Amen's Data) 


Age Groups Percentage of Response Types 

\ '.■ ' ; . S.F.= 73X ■ . I 


Two-Year-Olds t O.A =26X I 

(1 I.A.= 1X 

I S.F.=38X 1 

Three-Yeor-Olds | O.A.= 50X 

I '|l.A.=12X 


I S.F. = 23X I 

Four-Yeor-Olds | 0,A.= 5IS | 

r l•A■= 26 X I 

Key; S. F.= Static Form 
O.A.= Outer Activity 
1. A. = Inner Activity 

Figs. 3:1. and 3:2 are horizontal and vertical bar graphs, respectively, for 
Amen’s data in Table 2:1. Both graphs are based on the same data and each 
tells a similar all-over story, but with a different emphasis. The horizontal 
bars in Fig. 3:1 arc arranged by successive age groupings to emphasize the 
differences in the total composition by response types for each age group. The 
vertical bars in Fig. 3:2 are arranged by successive response types in order to 
emphasize the different proportions of the three types of response among the 
three age groups. 

Sometimes the spaces within the rectangles in bar charts are utilized for 
descriptive phrases and notations, as in Fig. 3:1. Or the rectangles may be 
cross-hatched or shaded in order to differentiate subdivisions within categories 
or to contrast categories themselves; their length is scaled at the side, and 



60 THE COMPARISON OF CATEGORICAL DATA 

the descriptive phrases for each subdivision are keyed on the chart, as in 
Fig. 3:2. 

Fig. 3:2. Comparison of the Relative Incidence of Types of Verbal Responses to 
Pictures Among Pre-School Children (Amen's Data) 



Static form Outer Activity Inner Activity 

TYPE OF VERBAL RESPONSE 


Key; WMi 2-Year.Olds S-Yeor-Olds 4.Yeor.Old$ 

These types of charts are used for statistical frequencies as well as for 
proportions. When contrasting colors can be employed for the different cate¬ 
gories or subdivisions to be compared, the result is considerably more effective 
than the black-and-white patterns used in these figures. When black-and- 
white patterns have to be employed in photographic or printed reproductions, 
the best effects can be obtained by using already prepared paper that comes 
in a great variety of patterns and can be cut to fit the surface area that is to 
be shaded or cross-hatched. Such prepared paper was used in these, as well 
as in several of the ensuing figures. 

Bar Charts for the Proportions of Wholes 

A further bar chart technique for the graphic comparison of Amen’s data 
on pre-school children’s responses to pictures is shown in Fig. 3:3. This type 
of graph is useful for comparing the proportionate composition of two or more 
groups (wholes) (a) when the number of instances (N) is the same for each, 
or (b) when the data in each categorical group have been reduced to per¬ 
centages of the whole. Fig. 3:3 exemplifies the latter. The length of each 
horizontal rectangle is taken as equal to 100% and the proportions of each 
type of response are laid off to scale and differentiated by contrasting pat¬ 
terns within the area of the rectangle. Although pie charts, as we shall see, 
are also commonly employed to differentiate the proportionate parts of a 
whole, rectangles or bar charts are more suitable for the types of compari¬ 
sons made in Fig. 3:3. It is easier to compare the composition of several cate- 


GRAPHIC METHODS FOR CATEGORICAL DATA 


61 


gories shown in a series of rectangles in the same vertical plane than when 
the categories are shown in several pie charts. 

Fig. 3:3. Comparison of the Relative Incidence of Types of Verbal Responses to Pic¬ 
tures Among Two-Year, Three-Year, and Four-Year-Old Pre-School Children 


Per Cent of Response Type 



Key for 

Response Outer Activity 

Inner Activity 


Of the three types of bar charts for Amen’s data shown in these three 
figures, the last is the most effective in the way in which it tells the whole 
story and emphasizes both (a) the response-type composition of each age 
group (reading across the chart) and (b) a comparison of the proportionate 
incidence of each response type for the successive age groups (reading down 
the chart). 

Bar Trend Graphs 

Charts of the type shown in Figs. 3:4 and 3:5 emphasize the change in the 
composition of successive categories. These are trend graphs and their con¬ 
struction is based on the bar chart principle. They may be employed with 
either frequencies or proportions. The categories in these particular charts 
are the four groups of City College upperclassmen. The changes in the com¬ 
position of each College Class by students’ degree objectives are portrayed 
on the basis of the data in Tables 3:1 and 3:2. Because of the few students 
choosing Education or Business Administration as a degree objective, these 
two subdivisions have been combined into “Other” at the top of each 
rectangle. 

Fig. 3:4 is based on the actual frequencies of each category and its sub¬ 
divisions. Hence the changes described are absolute, rather than relative to 
the size of each category. The frequencies are scaled at the left and right of 
the chart. The choice as to which subdivision is to be represented at the base 
of the rectangles, which next, etc., is of course arbitrary. However, once the 


62 


THE COMPARISON OF CATEGORICAL DATA 


choice is made for the first one (Lower Juniors), the same order must be 
maintained throughout. The order of the arrangement of the five degree- 
objective subdivisions in this figure is based on the size of each in the Lower 
Junior Class: the Science subgroup had the most students; Technology was 

Fig. 3:4. Student Composition of Upperclass College Groups According to Their 

Different Degree Objectives 



Lower Upper Lower Upper 

Juniors Juniors Seniors Seniors 

COLLEGE CLASSES 


next; Social Science third, etc. To facilitate the actual drawing of the rec¬ 
tangles to the frequency scales at the left and right, the number of students 
per subdivision for each Class can be cumulated as follows (for the Lower 
Juniors): Science = 199; Science 199 + Technology 182 = 381; Science 199 
+ Technology 182 + Social Science 174 = 555; these 555 + Arts 52 = 607; 
and these 607 + the 16 “Other” = 623. 

It will be observed that the emphasis in Fig. 3:4 is on gross changes in 
frequencies for the degree subgroups of the four categories of Classes. For a 
more accurate comparison of the change in frequencies, the original data 
(Table 3:1) must also be employed. Figure 3:4 gives an over-all view of the 
situation but shows at a glance whether there are any marked changes in the 
situation from one Class to another. 

Fig. 3:5 is similar to Fig. 3:4 in that it also indicates the changes in the 


























































































GRAPHIC METHODS FOR CATEGORICAL DATA 


63 


composition of the degree subgroups for the four categories of upperclassmen. 
This time, however, the relative changes are emphasized by the use of propor¬ 
tions (percentages) instead of frequencies. Each of the four Classes is used as 
the base, 100 per cent, and the proportions of students in each subdivision 
(degree objective) are indicated. Despite the greater absolute size of the 
Lower Junior Class, as seen in Fig. 3:4, it is apparent from Fig. 3:5 that 
relatively there is little change in the degree-objective composition of the 
four upperclass groups of the 2035 City College students. 

Fig. 3:5. A Comparison of the Relative Composition of Upperclass College Groups 
According to Their Degree Objectives 



COLLEGE CLASSES 

Bar Graphs for Relationships 

Fig. 3:6 illustrates the use of vertical bar graphs in comparing two sets of 
percentage values for several categories—^in this case, 12 states in the Union. 
The relationship between per capita income and per capita tax expenditures 
for public school education is shown. Each is taken as a percentage of the 
United States average as the base (projected at 100% on the chart by the 
horizontal broken line). 

It is apparent that the relationship between income and tax expenditures 
for public school education in these 12 states is strikingly close. Thus, per 
capita income in Alabama, Arkansas, and Mississippi is about 40% of the U.S. 
average, as is also per capita tax expenditure for public school education. In 
New York State, at the other extreme, per capita income is about 160%, and 





















































64 THE COMPARISON OF CATEGORICAL DATA 

per capita tax expenditure for public school education about 165%, of the 
U.S. average. 


Fig. 3:6. Showing the Relationship Between Income and Public School Tax Expendi¬ 
tures per Capita in New York and Other States, 1935 * 



* From “ Public School Costs in New York and Other American States,” Public Education 
Information Bulletin^ vol. 14, No. 2, 1939. New York State Teachers Association, Albany, 
New York. Heproductid by permission of A. J. Burke, Director of Studies. 


Belt Graphs 

When emphasis in a chart is to be focused on changes in the composition 
of a category that occur over a period of time or in continuously successive 
stages, as is characteristic of time series (changes in an attribute per unit of 
time over a period of time), the bar trend graphs in Fig. 3:4 and 3:5 are con¬ 
verted into belt graphs of the type illustrated in Fig. 3:7 and 3:8. Instead of 
separated rectangles being used, the continuity and trend of the statistical 
information are emphasized by plotting line graphs and shading with con¬ 
trasting patterns the areas of each subdivision. 

Fig. 3:7 shows the trend in the occupational composition of New York 
City’s population of employed persons over a period of five decades. Suc¬ 
cessive decades are plotted on the base line and at the top of the chart. The 
frequency scale of persons employed is drawn in units of 100,000 at the right. 
Each major occupational category is keyed in the upper left-hand corner 
of the chart. The total number of employed persons each decade is stated on 
the chart and is taken as the base for the percentages keyed on the body of 
the chart. Thus both the absolute and the relative changes are presented. 
The category. Domestic and Personal Service, was not differentiated from 
Clerical Occupations until 1910, and Public Service was also differentiated 
at that time. The greatest absolute increase in employed persons was in the 
Manufacturing and Mechanical Industries, and the greatest relative increase 
















GRAPHIC METHODS FOR CATEGORICAL DATA 


65 


was for the original category that combined Public Service and Professional 
Service prior to 1910. 

Fig. 3:7. Distribution of Persons by Occupational Groups, New York City, 1890- 

1930* 



MA««.»7| mayor s committee CITY PLANNING 


* Reproduced by permission of the Institute of Public Administration, New York City. 










































































































































































66 


THE COMPARISON OF CATEGORICAL DATA 


MILLIONS ^'0* ^9® Composition of Population, New York City* 



1900 1910 1920 1930 1940 1950 I960 

♦ Reproduced by permission of the Institute of Public Administration, New York City. 


Fig. 3:8 is a belt graph showing changes and trends in the age composition 
of New York City’s population since 1900. Although age is a variable attri¬ 
bute, the original data have been organized into five broad categories (or 
class intervals) and the trend of each is indicated in frequencies, scaled in 
millions at the left of the chart, f 

t This particular graph, made in the ’30’s, is interesting because of its projection of the 
anticipated composition of the population in 1940, 1950, and 1960. The 1940 U.S, Census 































































































































GRAPHIC METHODS FOR CATEGORICAL DATA 


67 


AGE is an attribute that figures importantly in practically all polls of public 
opinion. Therefore knowledge of a population’s age composition by broad 
categories is essential to the stratification of this variable in sampling statistics. 

Pie Diagrams 

Circular charts have long been used to portray the relative size of the parts 
of a whole. Each category, or subdivision, is represented by a pie-shaped 
piece (a sector of a circle) drawn 
to give an area equal to each 
category’s appropriate share of 
the whole area. 

In Fig. 3:9 a pie diagram is 
used to show the relative impor¬ 
tance of consumer expenditures 
in New York State prior to World 
War II. The name of each cate¬ 
gory and its relative size are 
indicated within each sector. A 
semicircular protractor scaled in 
percentages as well as in degrees 
facilitates the construction of 
such a chart. If percentage cali¬ 
brations are not available on a 
protractor, the percentage values 
of each category or subdivision 
must be converted into degrees. 

For example, 25% would be repre¬ 
sented by a sector with an angle 
of 90° [since .25(360°) = 90°], 
which is a quadrant of a circle. 

If frequenci(^s rather than percentages arc to be represented on a pie chart, 
the size of the angle for each sector can readily be determined by multiplying 
360° by the proportionate value of each sector’s frequency to the total number 
of frequencies (N). 

Pie diagrams are useful for comparative purposes when the sectors repre¬ 
sent proportions of the whole. However, if subdivisions of frequencies of two 
or more groups of data are to be compared and N (the total number of fre¬ 
quencies for each group) varies considerably, the proper construction of the 
diagram becomes somewhat complicated because the relative size of the area 
of each circle must be proportionate to the N's of each group compared. If 
group A is twice the size of group B, the area (not the radius) of circle A needs 

returns gave New York City a total population of nearly seven and a half million (in con¬ 
trast to the predicted seven million). 


Fig. 3:9. Showing the Relative Importance 
of Various Consumer Expenditures in New 
York State* 



* From “ Public School Costs in New York and 
Other American States,” Public Education Infor- 
maiion Hullelin^ vol. 14, No. 2, 1939. New York 
State Teachers Association, Albany, New York. 



68 


THE COMPARISON OF CATEGORICAL DATA 


to be twice as great as that of circle B. For such comparisons it is better to 
use the type of bar chart shown in Fig. 3:4, or the technique used for Fig. 3:8. 

The pie diagram in Fig. 3:10 illustrates one of the most common graphic 
devices used to show how the money of an organization is spent. In this par¬ 
ticular chart, emphasis is given to the relative size of each expenditure cate- 


Fig. 3:10. Income Dollar Distribution 



gory by the separation of each sector. The catchall category, common to the 
procedures of classification and division, is labeled “All Other Charges.” 
The base taken for the divisions of the total income is one dollar and the circle 
is portrayed in three dimensions, the edge being milled to represent a coin. 
Thus for each dollar spent, 45.43 cents went for wages, and slightly less 
than one cent was retained in the business. 


Maps 

Maps have long been used in economic statistics for the protrayal of 
statistical information related to geographical areas but are used less fre¬ 
quently for data of a psychological character. Figs. 3:11 and 3:12, however, 
are illustrative of the potentiality of maps for the presentation of psychological 
and educational information. Both are maps of the five boroughs of the City 
of Greater New York, which are under the jurisdiction of a single Board of 
Education. 

The average intelligence quotients of fifth-grade public-school children in 
114 districts of Greater New York are differentiated into five categories, 
described in the key at the lower right-hand corner and portrayed on the 
map in Fig. 3:11, Familiarity with the industrial, business, and residential 
areas of Greater New York as they existed more than a decade ago would of 
course add considerably to the information to be derived from this map. 







GRAPHIC METHODS FOR CATEGORICAL DATA 


69 

























70 


THE COMPARISON OF CATEGORICAL DATA 


Fig. 3:12. Percentage Change of Enrollment in Elementary and Junior High Schools, 

1929-1939, New York City* 


SCHOOL DISTRICTS 



ftROOKLY 

* Reproduced by permission of the Institute of Public Administration, New York City. 










71 



x<zx<i-H<z_aaceozx_g>cgooxj>-z gaiuuzco 





























72 


THE COMPARISON OF CATEGORICAL DATA 


However, it is apparent that the five categories of I.Q. averages are repre¬ 
sented in all the boroughs except Richmond (Staten Island) and that they 
are fairly well scattered in each borough. A high-average I.Q. district may be 
adjacent to a low one; thus District 47 in Queens has a high-average I.Q., 
whereas District 46 is low. 

Aside from the proper layout of statistical information on the map, the 
main problem in its construction is (I) to establish appropriate geographical 
subdivisions and (2) to obtain the necessary descriptive (or sampling) sta¬ 
tistics for each. The geographical subdivisions in the map in Fig. 3:12 are 
the 54 school districts of Greater New York, whereas the 114 subdivisions 
in Fig. 3:11 were set up in terms of smaller districts on the basis of the ele¬ 
mentary schools serving them. 

The map in Fig. 3:12 was constructed to portray the percentage changes of 
enrollment in elementary and junior high schools in Greater New York over 
the decade 1929 to 1939. Five categories of change are keyed at the left center 
of the chart. The trend of families away from the industrial and business 
sections of lower Manhattan (Districts 1, 2, 3, and 4) and from District 30 
across the East River in the Borough of Queens is striking, as is the trend 
toward an increase in outlying districts in Brooklyn, Queens, and the Bronx. 

In order to summarize the information on the map in Fig. 3:12, an addi¬ 
tional graph is desirable. The bar chart shown in Fig. 3:13 is suitable for the 
purpose. The predominantly decreasing trend in enrollment in all five bor¬ 
oughs, and the relative extent of both decreases and increases are immedi¬ 
ately apparent from an inspection of this chart. 

Maps that describe differences in the socio-economic character of neigh¬ 
borhoods in a city or in suburban and rural areas are used increasingly today 
in sampling statistics for the study of people’s attitudes and opinions. When 
such maps are carefully constructed, a fraction of all the geographical sub¬ 
divisions for each type of category can be sampled and the entire population 
satisfactorily studied from the information derived from only a small part. 

Pictorial Charts 

Pictorial charts are charts in which pictures or designs are employed that 
directly symbolize something essential about the character of the categories 
to be described or compared. Such charts are extensively used in the lay 
press to make interesting what would otherwise be uninteresting statistical 
information to the average reader. They are of special value in emphasizing 
the over-all result, bringing out contrasts, and showing the relation of the 
parts within a whole. Typical examples are illustrated in Figs. 3:14-3:19. The 
statistical information in each of these figures could readily be graphed in 
simple bar charts or pie diagrams. However, statistical charts obviously have 
greater interest value and attract and hold the eye, when symbols are em- 



GRAPHIC METHODS FOR CATEGORICAL DATA 


73 


ployed for people in different professions (Fig. 3:14), for different types of 
motivation (Fig. 3:15), for the earnings of workers in different countries 
(Fig. 3:16), for different types of occupations (Fig. 3:17), for different types 
of adjustive behavior (Fig. 3:18), and for differences in public opinion 
(Fig. 3:19). 


Fig. 3:14. Social Work and the Joneses* 

SOCIAL WORK-A SMALL PROFESSION 

_0_.-€L. 0-/A A/ A,A/A.A/A. A/A.A/A.A 


TEACHERS 


PROFESSIONAL 

ENGINEERS 


LAWYERS 


tliit tiiii itiii 

iFrlivlF iPff'IMw' 

MORE THAN ONE MILLION 

o o o c 

Ilf 

245,000 

in 


PHYSICIANS 



165,000 


CLERGYMEN 


SOCIAL 

WORKERS 



* From Public Affairs Pamphlet, Social Work and the Joneses^ by Huth 
l^rrigo and Bradley Buell. Published by the Public Affairs Committee, Inc., 
New York City. 


The relative size of social work as a profession in relation to five other pro¬ 
fessions is strikingly brought out in Fig. 3:14. Each person symbolizes roughly 
75,000 professional workers, but the rounded count is also given for each 
category. This pictorial chart makes vivid what otherwise might be a hori¬ 
zontal bar graph of frequencies. 

Different classes of motives for the purchase of U.S. Savings Bonds are 
portrayed in Fig. 3:15. However, the size of each category is given in terms of 


74 


THE COMPARISON OF CATEGORICAL DATA 


the maturity value of the bonds rather than in terms of the number of people 
or families buying them. Dollar discs are therefore employed to represent the 
amount of investment made by each group. This chart, like the preceding, is 
a pictorial substitute for what might otherwise be a simple horizontal bar 
graph. 


Fig. 3:15. Reasons Given by Individual Owners for Systematic Saving Through 
Savings Bonds—and Amounts Invested* 





©©©©©©©©©000 

Retirement 

©©©©OOOO00O 

Emergency 


Ai 

■■■ntn 


©©©©©©© 

Cosh Estate 

©©©©©© 

Education 


©©©©© 

Home Building 


pM ©0© 

L _Ji ■ Dependents 


m 


©< 

Travel Recreation 


Eoch disc represents 
$50,000,000 maturity 
value of Savings Bonds 


* From The Graphic Story of United States Savings Bonds^ Pam¬ 
phlet of the U.S. Department of Treasury, 1939. 


A comparison of the average annual real income of workers in seven differ¬ 
ent countries is made possible by the pictorial chart in Fig. 3:16, a com¬ 
parison for which a simple vertical bar graph might otherwise be used. 

Fig. 3:17 shows a pictorial device for emphasizing the character and size 
of the proportionate parts of a whole, ordinarily shown by a pie diagram. The 
statistical base is taken as 100 and consequently the data of each category 
can readily be interpreted in percentages. Thus, at the beginning of 1942, 
4 per cent of the “Labor Force” in the United States were in the Armed 





GRAPHIC METHODS FOR CATEGORICAL DATA 


75 


Fig. 3:16. How Much Does a Worker Earn?* 

$1381 

cm 



U.S. BRITAIN FRANCE GERMANY JAPAN ITALY CHINA 


* From Public Affairs Pamphlet, What Foreign Trade Means to 
You, by Maxwell S. Stewart. Published by the Public Affairs 
Committee, Inc., NewYork City. 


Forces; 8 per cent were in government service, etc. Our total labor force at 
the time was 52 million, according to the last line of the chart. 

The extent to which epileptics can make an occupational adjustment is 
portrayed in Fig. 3:18. In this pictorial chart, 20 such persons are taken as 
the statistical base and the behavior of epileptics is differentiated into four 
categories. A pie diagram or a bar chart like that shown in Fig. 3:3 might 
otherwise have been used for a less striking presentation of this statistical 
information. 

In Fig. 3:19 a pictorial device is combined with a statistical chart, viz., a 
series of grids that portrays the percentage volume of replies to a question 
used in a Fortune Survey of public opinion by Elmo Roper. The character 
of the replies is tabulated in percentages at the top of the chart, as is custom¬ 
ary in reporting such survey data. These figures are then dramatized by both 






76 


THE COMPARISON OF CATEGORICAL DATA 


the grids and the pictures. This particular survey question is an interesting 
historical “memento” of American public opinion in 1940. 


Fig. 3:17. Out of Every Hundred in Our Labor Force • • 


OUT OF EVERY HUNDRED IN OUR LABOR FORCE 


A. ® 


ffff 

4 ARE IN 
ARMED FORCES 


Hllllll 


8 EMPLOYED BY GOVERNMENTS 



14 EMPLOYED IN TRADE 


18 IN TRANSPORTATION, FINANCE, SERVICE AND CONSTRUCTON 



17 WORK IN AGRICULTURE 2 IN MINING 



IlllVVIflllllffiffli 


Our labor force numbers 52,000,000, including the Army and Navy 


* From the New York Times Magaziney February 15, 1942. Reproduced by permission 
of the New York Times and the Pictograph Corporation. 


EXERCISES 

1. Given a total group of 385 people, 110 of whom are women, 125 men, 80 girls, 
and 70 boys, state the relation of each of these four sub-groups to the whole in 
terms of (a) proportions, (b) percentages. 

2. In the preceding example, what proportion of the total group consists of adults? 
Of boys and girls? 





















77 


GRAPHIC METHODS FOR CATEGORICAL DATA 
Fig. 3:18. Epilepsy—The Ghost Is Out of the Closet* 

EPILEPTICS CAN WORK! 



OF EVERY 20 EPILEPTICS WHO ARE NOT SELFSUPPORTING 



16 WERE FOUND TO BE EMPLOYABLE 

OFTHEREMAINING4 



ONE DID NOT WANT TO WORK 




nn 


TWO, ACCORDING TOTHEIR PAST 
RECORDS,WERE NOT RELIABLE 

ONLY ONE COULD NOT WORKl 
BECAUSE OF SICKNESS 

* From l^blic Allairs Fai]ip(ilot. Epilepsy—The Ghost Is Out of the Closet^ by 
Herbert Yahracs. Published by the Public Affairs Committee, Inc., New York 
City. 


I 


3. Round off the following nunil)ers to two decinml places: 

a. 43.4083 

b. .11113 

c. 106.556 

d. 1.8555 

e. 3.1645 

d'. Set up a table for the following data, c-ompare the results of each sub-group in 
terms of percentages, and give your reasons for choosing the base or bases for 
the percentage comparisons: 

A group of 2000 people is composed of 1000 men and 1000 women. Of the 
men, 446 voted for the Democ;ratic candidate in the last election; 403 voted for 
the Republican candidate; 76 voted for other candidates; and the remainder 
did not vote. Of the women, 425 voted for the Democratic candidate; 437 for 
the Republican candidate; 42 for the other candidates; and the remainder did 
not vote. Of the men who voted for the Democratic candidate, 200 were over 
35 years of age. Of the men voting for the Republican candidate, 250 were 
over 35 years of age. Of the men voting for the other candidates, 32 were over 
35 years of age. And of the men who did not vote for any candidate, 60 were 
over 35 years of age. Of the women who voted for the Democratic candidate, 
195 were over 35 years of age. Of the women who voted for the Republican 
candidate, 220 were over 35 years of age. Of the women who voted for other 
candidates, 10 were over 35 years of age. And of the women who did not vote 
for any candidate, 75 were over 35 years of age. 




78 


THE COMPARISON OF CATEGORICAL DATA 


Fig. 3s19* 

If Hitler wins, should we: 

Find some way of continuing our European 

commercial business with Hitler's new Europe.44.2% 

Make every effort to develop business only 

with countries not under Hitler's control.40.0 

Don't know .15.8 



EACH GRID = 100%: EACH BLOCK. = i% 

* Reprinted from the August, 1940, Fortune survey by special permission of 
the editors. 


5. Compute the ratio of college graduates and non-college graduates for the following 
groups: 


Groups 

College Graduates 

Non-College Graduates 

A 

45 

175 

B 

225 

110 

C 

65 

195 

D 

135 

155 

E 

10 

280 


6. Gallup reported the following results in a public opinion poll on the question: 
“ Which political party—^the Republican or the Democratic—do you think is 
most interested in persons of above average income.?* (New York World Telegram, 
February 11, 1946.) 

Democratic 14% 

Republican 57% 

No Difference 17% 

No Opinion 12% 










GRAPHIC METHODS FOR CATEGORICAL DATA 


79 


Assume the total sample consisted of 2500 cases. 

a. What is the percentage difference between those answering “Democratic” 
and “Republican”? 

b. What is the percentage excess of those expressing “No Difference” over 
those with “No Opinion”? 

c. How many more, in percentages, answered either “Democratic” or “Re¬ 
publican” as compared with those who did not answer either “Democratic” 
or “Republican”? 

7. Compute the average percentage of the total group majoring in psychology for 
the following five college groups: 


College 

Total Number of 
Students 

Per Cent Majoring 
in Psychology 

A 

3000 

6% 

B 

500 

10% 

C 

1200 

15% 

D 

2000 

11% 

E 

900 

2% 


8. What is the essential difference between a simple bar graph and a bar trend graph? 
For what purpose is each used? 

9. What is the essential difference between a map chart and a pictorial chart? 
For what purpose is each used? 

10. Choose an appropriate graphic device for portraying each group of data in 
Exercises 5, 6, and 7. 



CHAPTER 4 


The Correlation of Categorical Data 


A. THE CROSS-TABULATION OF CATEGORICAL DATA 

Wlielher there is any relationship between the data of two attributes or 
qualities is determined by methods of analysis known as correlation. As the 
origin of the word correlation implies, the procedure is a means of determining 
whether there is any association or “co-” relation, that is, relation between 
the differentiated data of two attributes. 

Correlation methods are useful and relevant for problems in descriptive 
statistics, although in practice they are more often applied to problems in 
analytical and sampling statistics. As indicated in Chapter 1, these latter are 
problems in which the statistical data constitute samples of larger popula¬ 
tions. When used for problems in descriptive statistics, a correlation coeffi¬ 
cient simply summarizes the degree of co-relation found between two non- 
variable or variable attributes. When used for problems in sampling statistics, 
a correlation coefficient not only summarizes the relationship between the 
sample data of two attributes, but also provides the basis for an estimate of 
the correlation between the populations or universes from which the samples 
are derived. 

In this chapter we shall first describe the basis for organizing categorical 
data in descriptive statistics so that we can determine by inspection whether 
the data of two attributes are correlated. We shall then develop the funda¬ 
mental methods used for the actual computation of coefficients of correlation 
for the categorical data of non-variable attributes, as well as for the data of 
variables that are grouped into broad classes. 

Cross-Tabulation Essential to Correlation 

The essence of what is implied by correlation can be exemplified by the 
study of a 2 by 2 correlation chart, often referred to as a fourfold tabulation 
because of the four cells of cross-tabulated data. Table 4:1, which shows such 
a chart, is simply a cross-tabulated distribution of listener attitudes for two 
non-adjacent sequences, or parts, of a radio program.* The attitudes of each 
listener were studied in the Program Analyzer Laboratory of the Columbia 
Broadcasting System. Listeners whose attitudes were favorable to the two 


*J. G. Peatman and Tore Hallonquist, The PaUerning of Listener Attitudes Toward 
Radio Broadcasts — Methods and Results, Stanford Univ. Press, Stanford University, 1945. 

80 







THE CROSS--TABULATION OF CATEGORICAL DATA 


81 


sequences are indicated by a plus sign; those whose attitudes were unfavorable 
are indicated by a minus sign. In other words, despite the fact that attitudes 
constitute an attribute or trait which is a variable, the data in this study were 
dichotomized from a seven-point attitude scale to provide two classes: 
favorable and unfavorable attitudes. The dichotomization of the listeners’ 
attitudes for each program sequence was made near the middle of each dis¬ 
tribution. The actual number of cases in each class of the dichotomized 
attributes is given by the marginal totals at the bottom and at the right of 
the fourfold table. Thus, of the 59 subjects, 31 had favorable attitudes and 
28 had unfavorable attitudes toward the earlier sequence. Similarly, for the 
later sequence, 30 had favorable and 29 had unfavorable attitudes. 

Inspection of only the marginal totals does not provide any insight into a 
possible correlation between the attitudes of the listeners. The marginal 
totals simply summarize the results for each attribute separately. In order to 
study the possibility of correlation between the attributes, the cross-tabula¬ 
tions of the data into each of the four cells must be analyzed. The statistical 
fre(|uencies of these cross-relationships are given in each of the four cells in 
Table 4:1. In the case of correlated data, a statistical frequency in any cell 

Table 4:1. Cross-Tabulation of Listeners* Attitudes Toward Two Non-Adjacent 
Sequences of a Radio Program 

Attitudes Toward later Sequence 

— •+• Hr 


31 


28 


nc 29 30 N=59 

represents a pair of observations or measurements, one for each attribute corre¬ 
lated, which are related by virtue of a common property. In this case, a paired 
observation represents the attitudes of the same individual for the two 
sequences of the broadcast. 

The Correlation Chart as a Geometric Field 
The cross-tabulations in Table 4:1 were thus made for attributes which are 
actually variables but were dichotomized into two broad classes. It is because 
of this fact that there is no question as to the order of the arrangement of 
each of the two classes. The ihore favorable attitudes in each case are sym¬ 
bolized by a plus sign, and the less favorable attitudes by a minus sign. 


Attitudes Toward 
Earlier Sequence 



a 

b 


2 

29 


(+r-) 

+ ) 


c 

d 

_ 

27 

1 



(-;+) 




82 


THE CORRELATION OF CATEGORICAL DATA 


The fourfold table is analogous to a geometric field with the origin in the 
center at the right-angle intersection of the two lines dichotomizing each 
attribute. These are the usual four quadrants of such a field and they have 
been designated as a, b, c, and d. The order inherent in the attributes has 
been laid off for the vertical side of the square (the ordinate) and the hori¬ 
zontal side of the square (the abscissa) so that the quadrants themselves 
have the signs usually used in such a field; that is, the b and c cells are positive 
quadrants (6 =+» + and c =—, —), and the a and d cells are negative 
quadrants (a =+, — and d = —, -f-). Having utilized the order inherent in 
the two attributes and laid off each one accordingly, we have conformed to 
the implications of a geometrical field whereby a relationship obtained for 
the data will be truly positive or negative. Thus, if the major proportion of 
the cases is distributed in the b and c quadrants, and a considerably smaller 
proportion therefore lies in the a and d quadrants, the correlation is positive; 
and, contrariwise, if a considerable majority of the cases is distributed in the 
negative quadrants, a and d, the correlation is negative. 

Let us now examine the implications of the upper left-hand cell in Table 4:1, 
in which 2 of the 59 cases are entered. As indicated by the symbols at the left 
and top, these two cases represent listeners who had favorable attitudes 
toward the earlier sequence and unfavorable attitudes toward the later 
sequence. They constitute but a small proportion of the listeners in their 
respective column and row. That is, of the 31 individuals who had favorable 
attitudes toward the earlier sequence, only 2 had unfavorable attitude^s 
toward the later sequence. And of the 29 individuals who had unfavorable 
attitudes toward the later sequence, only 2 had favorable attitudes toward 
the earlier sequence. 

The 29 cases in the upper right-hand cell represent listeners whose attitudes 
toward both sequences were similar—all were favorable. The 27 cases in the 
lower left-hand cell also represent individuals with similar attitudes for both 
sequences—in this case, all unfavorable. 

Thus it is apparent that inspection of a fourfold table like Table 4:1 usually 
reveals whether two attributes or traits are correlated. If a considerable 
majority of the cases are in either set of the diagonally related cells (a and d, 
or 6 and c) there is evidence of correlation. From the data in Table 4:1 we see 
that practically all the listeners had attitudes toward the later sequence which 
corresponded with their attitudes toward the earlier sequence (negatives 
with negatives and positives with positives). The correlation for these data 
is therefore not only high but positive. It is positive inasmuch as the individuals 
with favorable attitudes toward one sequence are, on the whole, the individ¬ 
uals with favorable attitudes toward the other sequence; and similarly, the 
individuals with unfavorable attitudes toward one sequence are on the whole 
the individuals with the unfavorable attitudes toward the other sequence. 
The correlation coefficient itself is high (for *its computation, see page 94) 



THE CROSS-TABULATION OF CATEGORICAL DATA 83 

because all but three of the 59 cases are in the positively associated cells, 
b and c. 

An example of a high degree of negative correlation is illustrated by the 
cross-tabulated data in Table 4:2. This represents a hypothetical redistribu¬ 
tion of the data in Table 4:1. The correlation coefficient for these data would 
be identical in value with that for the data in Table 4:1, but it would be 
expressed as a negative value. Such a result is considered negative because a 
large majority of the listeners switched their attitudes toward the later 
sequence as compared with their attitudes toward the earlier sequence. The 
27 individuals in the upper left-hand cell of Table 4:2 represent listeners 

Table 4:2. Hypothetical Redistribution of the Cross-Tabulated Data in 
Table 4:1 to Illustrate Negative Correlation 

Attitudes Toward Later Sequence 

— + Or 


28 


31 


nc 29 30 N=59 

whose attitudes toward the earlier sequence were favorable, but were un¬ 
favorable for the later sequence; similarly, the 29 cases in the lower right- 
hand cell represent listeners whose attitudes toward the earlier sequence were 
unfavorable, but were favorable for the later sequence. In other words, the 
majority of the cross-tabulated frequencies arc now in the diagonal cells a 
and d which signify negative rather than positive associations. 

An Example of No Correlation 

No correlation between the cross-tabulated data in Table 4:1 would be 
indicated by a similar proportion of the cases being distributed in each of the 
four quadrants. Table 4:3 shows the data that yielded the marginal totals 
in Table 4:1, but redistributed so as to illustrate no correlation. The 31 lis¬ 
teners whose attitudes toward the earlier sequence were favorable are now 
about evenly divided in their attitudes toward the later sequence. Similarly, 
half of the 28 individuals whose attitudes toward the earlier sequence were 
unfavorable now have favorable attitudes toward the later sequence. There 
is thus no correlation between their attitudes, as distributed in the four cells 



a 

b 

-f 

27 

1 

Attitudes Taward 



Earlier Sequence 

c 

d 

— 

2 

29 




84 


THE CORRELATION OF CATEGORICAL DATA 


of Table 4:3. This is another way of saying that listeners who had favorable 
attitudes toward the first sequence are just as likely to have favorable as 
unfavorable attitudes toward the later sequence; similarly, listeners who had 
unfavorable attitudes toward the earlier sequence are just as likely as not to 
have favorable attitudes toward the later sequence. 

Table 4:3. Hypothetical Redistribution of the Cross-Tabulated . 

Data in Table 4:1 to Illustrate No Correlation 

Attitudes Toward Later Sequence 

— -j- Hr 


31 


28 


nr 29 30 N=59 



a 

b 


15 

16 

Attitudes Toward 



Earlier Sequence 

c 

d 

— 

14 

14 


The Correlation of Non-Variable Attributes 

As already indicated, Tables 4:1, 4:2, and 4:3, illustrating the correlation 
of dichotomized data, were based upon the data of attributes which in reality 
are variables. By virtue of this fact, the dichotomized data were laid out on 
the horizontal and vertical sides of the 2 by 2 matrix so as to give a result 
that would correspond to the meaning of the usual quadrants of coordinate 
axes in a geometric field. However, for non-variable attributes, there is no 
order inherent among the categories of data themselves. In other words, for 
truly non-variable attributes the decision as to how the categories of each 
attribute shall be laid off on the vertical and horizontal sides of the cross¬ 
tabulation matrix is purely arbitrary. Similarly, the concepts of positive and 
negative correlation have no meaning for the cross-tabulated results of such 
data. It is rather a question of whether there is any correlation or association. 
If there is any correlation, it must be interpreted by means of verbalization. 

This situation, characteristic of the correlation of non-variable attributes, 
is illustrated by Table 4:4. The data are of the kind often obtained in market 
research investigations. One hundred persons—50 men and 50 women—are 
asked: “Have you ever used K brand of soap?” The marginal totals at the 
right of the table indicate that 50 of the total group answer “Yes” and 
50 answer “No.” Thus, these data give an even division not only for the 
sexes but also for the responses. Inspection of the cross-tabulated data in 
the four cells indicates that there is considerable correlation between the sex 




THE CROSS-TABUUTION OF CATEGORICAL DATA 85 

of the respondents and the character of their replies. Thus, 40 of the 50 women 
answer “Yes,” whereas only 10 of the 50 men answer in the affirmative. 

Table 4:4. Cross-Tabulation of the Data of Two Non-Variable 
Attributes (Commodity Use by Men and Women) 


Sex of Respondent 
Women Men nr 



no 50 50 N=100 


If the results of this cross-tabulation could be interpreted as were the data 
in Tables 4:1, 4:2, and 4:3, the association would be described as negative 
correlation. The distribution of the cross-tabulated data in Table 4:4 is most 
like that in Table 4:2; 80% of the cases are in the “negative” quadrants 
(tt and d). However, to describe the correlation in Table 4:4 as negative is 
not warranted. Whether the data on the sex of the respondents is arranged 
as in Table 4:4 or as in Table 4:5 is purely arbitrary. Table 4:5 presents 
exactly the same correlation as Table 4:4, but the replies of the male respond¬ 
ents are cross-tabulated in the left-hand cells (a and c) and those of the 
female respondents in the right-hand cells (6 and rf). 

Table 4:5. Cross-Tabulation of the Data (from Table 4:4) of Two Non-Variable 
Attributes (Commodity Use by Men and Women) 


Sex of Respondent 
Men Women nr 



He 


50 


50 


N=- 100 





86 


THE CORRELATION OF CATEGORICAL DATA 


The interpretation of a correlation between two non-variable attributes 
thus involves verbalizing the relationship as it is observed to exist, rather 
than labeling the result as negative or positive. In this case, the women 
generally have used the brand of soap, and the men generally have not. 

The foregoing remarks also apply to a correlation between any two attri¬ 
butes, one of which is a non-variable and the other a variable. Thus, a corre¬ 
lation between sex and height or between sex and ability cannot be described 
as either positive or negative. Men tend to be taller than women, and one sex 
might tend to do better than the other in a particular ability test. 

The Correlation of Polytomous Attributes—^Market Research Data 

The data of non-variable attributes of research are often classified in more 
than the two categories characteristic of a dichotomy. We saw earlier that 
attributes divided into three categories yield a trichotomy, and, if into more 
than three categories, a polytomy. Methods for computing a correlation 
coefficient for the cross-tabulated data of polytomous non-variable attributes 
have not been fully developed. The available methods are based upon the 
assumption that trichotornous or polytomous divisions are derived from a 
variable (as they often are) rather than from a non-variable attribute. Pear¬ 
son’s method for the Coefficient of Mean Square Contingency, developed 
later in this chapter, is used to measure the degree of correlation for such 
situations. (Sec also pages 443 ff.) 

Table 4:6. Attitudes of Economic Groups Toward Private vs. Government 
Management: The 3 by 4 Cross-Tabulation of Two Attributes 


Economic Groups 
Lower Upper 

Low Middle Middle High fir 



Table 4:6 presents a 3 by 4 cross-tabulation based on data obtained from a 
market research investigation reported during the war by the Psychological 




THE CROSS-TABULATION OF CATEGORICAL DATA 


87 


Corporation.* One of the questions asked of the nation-wide sample of re¬ 
spondents was: 

Do you think that business companies will do a better job if they are allowed to 
keep on under their own management, or if the Government takes them over and runs 
them completely? 

Of the 2500 respondents, 67% thought that4)usiiiess companies would do a 
better job if kept under their own management. Only 14% thought that the 
companies would do better if under government management. Nineteen per 
cent did not know. These are the over-all results. If, however, the data are 
analyzed in relation to income groups, the results shown in Table 4:6 are 
obtained. These data reveal that there is a correlation between economic 
STATUS and the nature of the answer to the question. A relatively greater 
proportion of the higher income groups felt that the companies would do a 
better job under tlieir own management. Conversely, a relatively greater 
proportion of the lower income groups felt that a better job would be done 
under government management. Moreover, a greater proportion of the higher 
income groups had a definite opinion. In fact, only 12 of the 250 respondents in 
the highest income group gave a DK answer, whereas 150 of the 500 respond¬ 
ents in the lowest income group gave this answer. 

The cross-tabulated results in Table 4:6 can be interpreted more readily if 
the frequencies of each cell are converted into percentages. Such a conversion 
immediately raises a question as to what total (N) or set of totals shall be 
used as the base. The answer depends upon the type of comparison to be 
made. Since the original differentiation of the respondents into income groups 
orients the comparison in this direction, the totals of each of the four income 
groups provide the most appropriate bases for the percentages of each cell. 

A comparison of the respondents’ answers to the question could be made 
by using the total 2500 cases as the base for the percentages in each cell of 
the 3 by 4 table. However, the most relevant picture of the relation between 
the respondents’ opinions and their economic status is that sliown in Table 4:7. 
According to this table, 90% of the high income, 76% of the upper middle 
income, and 66% of the lower middle income group felt that business companies 
would do a better job under their own management; but only 46% of the low 
income group were of this opinion. The proportion of the affirmative answers 
decreases as the economic status of the respondents decreases. On the other 
hand, the proportion of replies in favor of government management increases 
as the economic status decreases—from 5% for the high income group to 24% 
for the low income group; however, the rate of increase here is not the same 
as the rate of decrease in the former case. The difference is accounted for by 
the DK's, As the economic status of the respondents decreases, the proportion 


* I'hc Psychological Corporation, “The Eighth Nation-Wide Social and Experimental 
Survey,” New York, 1943. 



88 


THE CORRELATION OF CATEGORICAL DATA 


of Z>/f’s increases: only 5% of those in the high income group gave Z)/f 
answers, whereas 30% in the low income group gave DK answers. 

Table 4:7. The Cross-Tabulated Frequencies in Table 4:6 
Converted into Percentages 

Economic Groups 



Low 

Lower 

Middle 

Upper 

Middle 

High 

All 

Private 






Management 

46% 

66% 

76% 

90% 

67% 

Government 






Management 

24% 

u% 

9% 

5% 

14% 

Don't Know 

30% 

20% 

15% 

5% 

19% 


too% 

100% 

100% 

100% 

100% 

(N) 

(500) 

(1000) 

(750) 

(250) 

(2500) 


From the point of view of a correlational analysis, the data in Tables 4:6 
and 4:7 might perhaps be clearer if the respondents’ answers were reclassified 
as in Tables 4:8 and 4:9. As indicated in Chapter 2, a complication arises in 
the analysis of market research data when there are a considerable number of 
DK answers. In the present case, the answers of the respondents can first 
be dichotomized into two categories: (1) those who have an opinion, and 

Table 4:8. Reclassification of the Data in Table 4:7, Dichotomizing Respondents 
According to Those Who Had an Opinion and Those Who Did Not Have an 

Opinion 

Economic Groups 



Low 

Lower 

Middle 

Upper 

Middle 

High 

Total 

Group 

Those with 






an Opinion 

70% 

80% 

85% 

95% 

81% 

Those with 






No Opinion 
(DK’t) 

30% 

20% 

15% 

5% 

19% 


100% 

100% 

100% 

100% 

100% 

' (N) 

(500) 

(1000) 

(750) 

(250) 

(2500) 





89 


THE CROSS-tABUUTION OF CATEGORICAL DATA 

(2) those who do not have an opinion. This reclassification of the data yields 
the results shown in Table 4:8. It is now clear that having an opinion or 
having no opinion on this question is correlated with economic status, for 
those in the higher income groups were more likely to have an opinion, and 
those in the lower income groups were less likely to have an opinion. 

The cross-tabulation in Table 4:9 presents the trend among the 81 per cent 
of the respondents who had an opinion one way or the other. It is now even 

Table 4:9. Reclassification of the Data in Table 4:6, Dichotomizing the 
Replies of Respondents Who Had an Opinion 

Economic Groups 



Low 

Lower 

Middle 

Upper 

Middle 

High 

Total 

Subgroup 

Private 






Management 

66% 

83% 

89% 

95% 

83% 

Government 






Management 

34% 

17% 

11% 

5% 

17% 


100% 

100% 

100% 

100% 

100% 

(N) 

(350) 

(800) 

(638) 

(238) 

(2026) 


clearer than it was from Tables 4:6 and 4:7 that there is a correlation between 
economic status and answer to the question. The higher the income, the 
greater the proportion of answers in favor of private management; and con¬ 
versely, the lower the income status, the greater the proportion of answers 
in favor of government ownership. However, even in the low income group, 
practically two-thirds (66%) of the members of this group who had an opinion 
felt that business companies would do a better job if allowed to keep on under 
their own management. 

The data used for the preceding tables are based on attributes at least one 
of which is a variable, viz., income or economic status. In market research 
investigations, however, economic status is usually treated by a breakdown 
into three, four, or five classes, rather than by attempted measurements or 
ratings on a continuous scale. That this attribute is a variable rather than a 
non-variable should be apparent from the order inherent in the arrangement 
of the data; that is, the classes are arranged in order of economic groupings, 
from lowest standards of living or income to highest standards of living or 
income. Sometimes this attribute is quantitatively differentiated by using 
actual income in dollars as the index of economic status. However, research 
has indicated pretty clearly that a few economic groupings based upon 
several factors such as type of home, home conveniences, neighborhood, etc., 




90 


THE CORRELATION OF CATEGORICAL DATA 


as well as dollar income, give a better index of socio-economic status than 
dollar income alone. 

The other attribute, respondents’ replies to the question, can be interpreted 
as a variable less readily, if at all. This is because, as was brought out in the 
reclassification of the data in Tables 4:8 and 4:9, the replies actually yield 
two different attributes. The first attribute, as shown in Table 4:8, is “Having 
an Opinion” or “Not Having an Opinion.” This is a dichotomy, and there 
is little question but that this attribute is non-variable—a person either has 
an opinion or he hasn’t. The second attribute, as indicated in Table 4:9, 
represents a twofold division of the replies of the respondents who had an 
opinion. This situation is not so simple with respect to the logic of character¬ 
izing the results as a true dichotomy of a non-variable attribute. It might be 
argued, for example, that the form of the question itself has forced the dichot¬ 
omy, that in reality the respondents may have had many different shades of 
opinion with respect to private management and government management, 
or a combination thereof. In practice, such an attribute is often treated as if 
it comprises a variable attribute, with the shades of opinion theoretically 
distributed according to a standard type of distribution. If the assumption 
can be made that the shades of opinion, as well as the differences in economic 
status, are distributed in a form similar to the normal probability curve, then 
the statistical method for the Contingency Coefficient can be used without 
further qualification to compute a correlation coefficient for the cross-tabu¬ 
lated data in Table 4:9. (See p. 94.) 

B. METHODS FOR THE CORRELATION OF CATEGORICAL 

DATA 

The extent to which the categorical data of two attributes or qualities are 
correlated can be expressed by a coefficient. Correlation is not an all-or-none 
affair; it is not a question of whether two attributes are perfectly correlated 
or not correlated at all. Correlation is always a question of the degree of such 
relationship as may be present. 

Mathematical methods that have been developed to express the degree of 
correlation between two attributes, whether variable or non-variable, yield 
an index which may vary in value from no correlation (indicated by zero) to 
perfect correlation (indicated by a coefficient of 1.00). In the case of variable 
data for which positive and negative directions of correlation are relevant, 
the correlation coefficients may vary from a perfect positive correlation 
(1.00) through zero to a perfect negative correlation ( — 1.00). A coefficient 
of .90 expresses a high degree of positive association, whereas a coefficient of 
.10 expresses a very low degree of positive association. Similarly, a coefficient 
of — .90 expresses a high degree of negative association, and a coefficient of 
— .10 expresses a very low degree of negative association. We shall see that 
not all methods of correlation yield coefficients which are strictly comparable 



CORRELATION AAETHODS 


91 


in all respects. Coefficients of the values LOO, 0.00, and —1.00 are always 
comparable for any method of correlation. They signify, respectively, perfect 
positive correlation, no correlation, and perfect negative correlation. However, 
coefficients with values between these limiting points vary in their implica¬ 
tions according to the method of correlation used. 

In the historical development of methods for computing correlation coeffi¬ 
cients, most attention has been given to those that index the degree of corre¬ 
lation between variable attributes. This is because most research studies in 
psychology and related fields have been concerned with data of variables 
rather than non-variables. The most widely used method of indexing the 
degree of correlation between two variables is the one referred to in Chapter 1, 
the product-moment method originally developed by Galton and perfected 
by Karl Pearson. (See Chapter 9.) Many years ago, however. Yule presented 
methods that yield an index for the correlation between non-variable as well 
as variable attributes which are dichotomized or divided into only a few 
categories. 

Yule’s Coefficient of Association (A) for Dichotomized 
Non-Variable Attributes * 

The method developed by Yule for indexing the degree of correlation 
between two dichotomized non-variable attributes yields a coefficient known 
as the Coefficient of Association. It is simple to compute, and we shall illus¬ 
trate it with the data from Table 4:4, which are reproduced in Table 4:10. 

Table 4:10. Correlation Between Sex and Use of Brand K of Soap 

Sex of Respondents 
F M fir 


50 


50 


nc 50 50 N = 100 

These data give in a fourfold table the cross-tabulation of the sex and answers 
of respondents to the question: “Have you ever used K brand of soap?” 

* G. U. Yale and M. G. Kendall, An Introduction to the Theory of Statistics, Griffin, 
London, 12th ed., 1940, p. 44. Cf. also G. U. Yule, “On the Methods of Measuring the 
Association Between Two Attributes,” Journal of the Royal Statistical Society, 75:576-642, 
1912. 


= Yes 


I 

-o 

I 

a 

t 


No 


a 

40 

b 

10 

c 

10 

d 

40 




92 


THE CORRELATION OF CATEGORICAL DATA 


Cell a of Table 4:10 indicates that 40 of the 50 female respondents had 
used Brand K; cell b indicates that only 10 of the 50 males had used this 
brand; cell c indicates that 10 of the females had not used this brand; and 
cell d indicates that 40 of the males had not used this brand. We have already 
seen that this cross-tabulation implies a considerable correlation between 
sex and soap usage. Generally the women have used the soap and generally 
the men have not. The actual degree of correlation for these data may be 
expressed by the Coefficient of Association, which is .88. It is computed by 
the following formula: 

ad^hc t4:l] 

A = — , . V ■ Yule’s Coefficient of 

Association 


ad be 


in which a, 6, c, d represent the statistical frequencies of these respective cells 
in the 2 by 2 table. A is therefore computed as follows: 

( 40 ) ( 40 ) - ( 10 ) ( 10 ) 1600 - 100 1500 

“ ( 40 ) ( 40 ) + ( 10 ) ( 10 ) 1600 + 100 1700 

The Coefficient of Association is based upon the ratio of the difference of 
the products of the frequencies in the diagonal cells to the sum of the products 
of the frequencies in the diagonal cells. For dichotomized non-variable attri¬ 
butes, a negative value docs not really signify negative correlation. Whether 
or not the coefficient A, as computed, is negative or positive is arbitrary, 
since the result depends upon the arrangement of the respective categories 
of each attribute in the fourfold table. Only in the case of cross-tabulated data 
of two variable attributes does a negative sign with a correlation coefficient 
signify correlation which is, in reality, negative rather than positive. It will 
be recalled that positive correlation in such cases signifies a tendency for the 
larger or greater values of each variable to be associated, and for the lower 
values of each variable to be associated, whereas negative correlation signifies 
that the larger values of one variable tend to be associated with the lower 
values of the other. The implications of a coefficient of correlation for non¬ 
variable attributes must be verbalized from the nature of the data correlated. 


The Correlation of Dichotomized Variables: The Phi Coefficient 


The Coefficient of Association can also be used to index the correlation 
between two variable attributes which have been dichotomized. However, a 
better index of their correlation is given by the phi (<#>) coefficient. As in the 
case of the Coefficient of Association, </> is based upon the ratio of frequencies 
in the cells of a fourfold table. The coefficient is computed as follows: 


_ be — ad _ 

V(a •+• b)(e -f d){a + c){b -f d) 


[4:2] 

0 coefficient of correla¬ 
tion for dichotomized 
non-variable attributes 



GORRELATION METHODS 


93 


If the two attributes are both variables which have been dichotomized, a 
better estimate of their correlation is made by dividing 0 by the constant .637, 
as follows: 


0r = 


0 

.637 


[4:3] 

0 coefficient of correla¬ 
tion for dichotomized 
variates 


This correction factor yields a coefficient greater than 1.00 (the mathematical 
limit of perfect correlation) if 0 is greater than .637. In such cases, the result 
is interpreted as approaching 1.00 as a limit. 

If only one attribute is a dichotomized variable, the other attribute being 
a true dichotomy of a non-variable, a better estimate of their correlation is 
made by dividing 0 by the constant .798, as follows: 

[4:4] 

0 coefficient of correla- 
. 4> tion for correlation of 

^ .798 a true dichotomy with 

a dichotomizcHl vari¬ 
ate 

That the 0 coefficient for dichotomized non-variable attributes yields an 
index of correlation which differs from the Coefficient of Association may be 
shown with the data in Table 4:10. For these data 0 is as follows: 

_ (10)(I0) - (40)(40) _ 

V(40 -f 10)(10 + 40)(40 -f 10)(10 + 40) 

_ -1500 ^ -1500 ^ 

~ V6250000 ~ 2500 

This value of .60 (the negative sign is dropped as irrelevant) for the 0 index of 
correlation between the two non-variable attributes in Table 4:10 is a measure 
of correlation which is more analogous in its implications to the measures of 
correlation for variates (cf. Pearson’s product-moment coefficient, Chapter 9) 
than is the value of .88 obtained by the Coefficient of Association method. 

Whenever it is logical to assume that one or both of the attributes being 
correlated is a variable rather than a non-variable, 0 should be used instead 
of A. The 0 correlation of the dichotomized data of the variable attributes in 
Table 4:1 is computed as indicated in Table 4:11. The 0 coefficient for these 
data is found to be .90. However, both the attributes which are dichotomized 
in this table are in reality variables. Therefore it is relevant to estimate r 
(the product-moment correlation coefficient) for this 0 value. By Formula 4:3, 
0r = 0/.637. Since 0 is greater than .637 and the correction, if applied, would 
yield an estimated r value in excess of 1.00, we can conclude that 0r approaches 
1.00 as a limit and denote the estimate as equal to 1.00“. 



94 


THE CORRELATION OF CATEGORICAL DATA 


Table 4:11. 4 > Correlation of Listener Attitudes for Two Sequences 

of a Radio Program 

Attitudes Toward Later Sequence 



— 

+ 


o 

b 

+ 

2 

29 

Attitudes Toward 



Earlier Sequence 

c 

d 

— 

27 

1 

He 

29 

30 N 


<t> = 


(29)(27) - (2)(1) 


V(2 -h "291(27 +'l)(2 + 27)(29 + 1} 


781 


781 


V755I6O 869.0 


= .899, or .90 


783 - 2 

V(31)(28K29)(30j 


The Correlation of Polytomous Attributes: The Contingency 

Coefficient 

Methods have also been developed for computing an index of correlation 
for the cross-tabulated data of attributes divided into more than two broad 
classes. The method most commonly used for this purpose is the one developed 
by Karl Pearson. It gives a statistical index of correlation called the Coeffi¬ 
cient of Mean Square Contingency. In practice, this coefficient is symbolized 
by C and is referred to as the Contingency Coefficient. 

The computation of C will be described for the data in Table 4:12, which 
were derived from the cross-tabulation in Table 4:6 and represent the relation 
between economic status and respondents’ opinions concerning private vs. 
government management of business companies. It will be recalled that the 
attribute income status is definitely a variable differentiated not on a con¬ 
tinuous, quantitative scale, but into four broad classes ranging from lowest 
to highest. It will also be recalled that the other attribute (having an opinion 
on the question) may also be construed as a variable representing a dichoto- 
mization of all shades of opinion ranging from strong convictions in favor of 
private management to very weak or no convictions in favor of private 
management (with government management presumably as the alternative). 
In any event, the data in Table 4:6 yield a 2 by 4 or eightfold cross-tabulation 
of two attributes whose data are arranged categorically. The percentage 
values of each cell in Table 4:9 have been reconverted to frequencies in 






CORREUTION METHODS 


95 


Table 4:12. Relation Between Economic Status and Respondents' Opinions About 
Private vs. Government Management of Business (from Data in Table 4:6) 

Economic Groups 



Low 

Lower 

Middle 

Upper 

Middle 

High 

Private 

Management 

a 

230 

b 

660 

c 

570 

d 

225 

Government 

Management 

e 

120 

f 

140 

9 

68 

h 

13 

no 

350 

800 

638 

238 


nr 

1685 


341 

N - 2026 


Table 4:12, inasmuch as the contingency method of correlation is based upon 
frequencies rather than proportions. The Contingency Coefficient itself is 
equal to the following ratio: 

_ [4:5] 

_ Is — N Pearson’s Coeflident 


C 


s 


of Mean Square Con¬ 
tingency 


where S is equal to the sum of the ratios obtained in the last column of 
Table 4:13 and N is the total number of correlational frequencies or paired 
observations used in the cross-tabulation of the two attributes. 


Steps in Computing Q {Table U:i3) 

Column 2: List the statistical frequencies obtained for each cell of the 
cross-tabulation shown in Column 1. These are the obtained frequencies, 
Column 3: Square each of the cell frequencies to get/o^ 

Column 4: Compute “independence values” for each cell. These values 
give the hypothetical frequency (/*) for each cell to be expected on the basis 
of chance according to the total number of frequencies of the column {nc) 
and row (rir) in which the cell is located. The hypothetical “independence 
value” of any cell is equal to the following ratio: 

[4:6] 

Hypothetical fre- 
quency value for any 
fh = cell of a correlation 

chart, on the assump¬ 
tion of independence 
between attributes 

where nr is equal to the number of frequencies for the row in which the cell 
is located, nc is equal to the number of frequencies for the column in which 




96 


THE CORRELATION OF CATEGORICAL DATA 


the cell is located, and N is equal to the total number of frequencies obtained 
and used in the cross-tabulation. The sum of these “independence values” 
(Nh) should equal (except for dropped decimals) the total number of correla¬ 
tion frequencies (No)- 

Column 5: Compute for each cell the ratio of its squared frequency value 
(/o* of Column 3) to its theoretical “independence value” (fh)- These ratios are 
presented in Column 5. Sum these ratios to obtain the value of S for the 
computation of C. 

The contingency coefficient for the data in Table 4:13 is computed at the 
bottom of the table and is found to be .23. 


Table 4:13. Computation of C, the Contingency Coefficient, from 
Cross-Tabulation of Table 4:12 


(1) 

Cells 

(2) 

fo (Obtained 
Frequencies) 

(3) 

(4) 

(ncilr) 

(5) 

(ncnr)/No = h 

(6) 

h 

a 

230 

52,900 

350(1685} 

589,750/2026 = 291.1 

181.7 

b 

660 

435,600 

800(1685) 

1,348,000/2026 = 665.4 

654.6 

c 

570 

324,900 

638(1685} 

1,075,030/2026 = 530.6 

612.3 

d 

225 

50,625 

238(1685} 

401,030/2026 == 197.9 

255.8 

e 

120 

14,400 

350(341) 

119,350/2026 = 58.9 

244.5 

f 

140 

19,600 

800(341) 

272,800/2026 = 134.6 

145.6 

g 

68 

4,624 

638(341) 

217,558/2026 = 107.4 

43.1 

h 

13 

169 

238(341) 

81,158/2026 = 40.1 

4.2 


No = 2026 



(check) Na = 2026.0 

S = 2141.8 


c= V- 


'2141.8 -- 2026 
2141.8 


= vGoiil = .2326 = 


.23 


How this correlation is interpreted depends upon the way in which the 
data are distributed in the cells in Table 4:12. The coefficient C is written 
without a sign. The arrangement of the attributes of the table and of the 
data in each cell, however, indicates the direction of the correlation, viz., the 
higher the economic status, the greater the tendency for the respondents’ 
opinions to favor private management of business, and conversely, the lower 
the economic status the greater the tendency to favor government manage¬ 
ment. That this is the case is emphasized by comparing the actual, obtained 
frequencies per cell (/<,) with the theoretical number of frequencies for each, 
as shown in Table 4:14, Cells a and h, for example, have fewer obtained than 
hypothetical frequencies, whereas cells d and e have more obtained than 
hypothetical frequencies. These are the extremes of the paired observations 
and such a trend of the frequencies in the diagonally located cells is, as we 
have seen, indicative of correlation. However, the distribution of the obtained 














CORRELATION METHODS 


97 


Table 4il4. Comparison of Obtained Frequencies (fo) with Hypothetical 
Frequencies (f/,) for Data in Table 4:13 

Economic Groups 



Low 

Low 

Middle 

Upper 

Middle 

High 


a 

b 

c 

d 

Private 

fo = 230 

fo = 660 

fo = 570 

fo = 225 

Management 

h = 291 

h = 665 

fh = 531 

fh = 198 


e 

f 

Q 

h 

Government 

fo = 120 

fa = 140 

fo = 68 

fo = 13 

Management 

h= 59 

fh = 135 

fh = 107 

fh = 40 


frequencies in relation to the hypothetical frequencies for all eight cells needs 
to be considered. This is done in computing C, which is equal to only .23. 
This degree of correlation is not very marked; at best it is indicative only of 
a tendency to a relationship between the attributes correlated. 

Mathematical Limits of C and Estimates of Ccor 

There is a limit to the computed value of a Contingency Coefficient in the 
correlation of broad or only a few classes. The value of C is not affected by 
variations in the total number of frequencies (provided they are considerable), 
but it is affected by the number of cells used in the cross-tabulation of two 
attributes. Yule and Kendall * have presented the maximum possible values 
of C for the cross-tabulations of attributes, each of which is divided into the 
same number of categories. These values, which are given in Table 4:15, are 
useful in correcting C values to obtain better estimates of the degree of corre¬ 
lation. 

Table 4:15. The Maximum Values of C for Correlated Attributes Divided 
into the Same Number of Categories 


2 by 2-fold, C cannot exceed .707 


3 by 3-fold, •• •• *• .816 

4 by 4-fold,. .866 

5 by 5-fold,. .894 

6 by 6-fold,. .913 

7 by 7-fold,. .926 

8 by 8-fold, •• " •* .935 

9 by 9-fold, ** " " .943 

10 by 10-fold,. .949 


The maximum possible computed value for C derived from a fourfold 
(2 by 2) cross-tabulation is .707. A better estimate of correlation by means 


Yule and Kendall, op. ciL, p. 69. 








98 


THE CORRELATION OF CATEGORICAL DATA 


of the Contingency CoefBcient can be obtained by dividing C by .707, when 
C is derived from a fourfold table. Similarly, an estimate from a 3 by 3 table 
can be obtained by dividing C by .816. Generally, C is most satisfactory for 
attributes polytomized into 5 by 5 or more divisions, provided, however, the 
subdivisions are not too fine (not greater than 9 or 10). 


EXERCISES 

1. Set up a fourfold table for the following data, determine the degree of correlation 
by means of (a) the Coefficient of Association and (b) the phi coefficient, and 
interpret your results: 

Of a total group of 400 adults, 150 are men; and 175 of the total group belong 
to labor unions. Of the 175 who belong to labor unions, 100 are men. 

2. Set up a correlation table for the following data, determine the degree of correla¬ 
tion in terms of the Contingency Coefficient, and interpret your result: 

Of a total group of 3000 people, 500 graduated from high school and had at 
least some college education; 1500 graduated from high school but had no college 
education; and 800 had some high school education but did not graduate. Of 
the 500 with some college education, 100 were in favor of “large families” (as 
opposed to “small families”); of the 1500 high-school graduates with no college 
education, 450 favored “large families”; and of the 800 with some liigh-school 
education, 320 favored “large families.” The total number favoring “small 
families” was 1980. (Note that part of the total group had no high-school edu¬ 
cation.) 



CHAPTER 5 


The Reduction and Organization of 
Variate Data 

A. INTRODUCTION 

This chapter will present the initial methods necessary for the reduction and 
organization of the data of variable attributes or qualities. They include 
procedures for rearranging the raw data of a variable into an ordered structure 
that will compactly portray the character of the distribution of data. Whether 
the form of a variable is similar to the normal, bell-shaped distribution, or 
some other type, can usually be ascertained from a graph of the frequency 
distribution of the variable. Additional methods of statistics to be used in the 
description of a result will in part depend upon the type of distribution that 
a variable yields. We shall be concerned here with the following: 

1. The range and array. 

2. The frequency distribution. 

3. The histogram and the frequency polygon. 

4. The percentage frequency distribution and polygon. 

5. The cumulative and percentage cumulative frequency distributions. 

B. THE RANGE AND ARRAY 

The first step in the organization of variate data consists in determining 
the range. The numerical values of the lowest and highest (or smallest and 
largest) scores in a group of variate data constitute the range. It is readily 
determined by inspection for a group of only 100 or so cases. Looking through 
the intelligence test scores in Table 5:1, we see that the highest is 100 and the 


Table 5:1. Intelligence Test Scores of 100 College Freshmen 


49 

66 

66 

86 

75 

34 

21 

12 

58 

17 

34 

30 

52 

56 

67 

58 

80 

40 

21 

17 

73 

56 

13 

40 

79 

73 

65 

61 

43 

30 

85 

21 

40 

66 

14 

75 

91 

65 

50 

38 

85 

94 

26 

56 

76 

24 

71 

73 

100 

53 

30 

11 

40 

64 

38 

56 

10 

11 

3 

59 

62 

52 

61 

76 

11 

39 

99 

52 

19 

73 

24 

77 

58 

44 

36 

26 

38 

15 

64 

63 

19 

45 

42 

64 

31 

48 

62 

89 

60 

8 

76 

21 

89 

47 

98 

29 

47 

63 

91 

32 










99 





100 


THE REDUCTION AND ORGANIZATION OF VARIATE DATA 


lowest is 3. The range is therefore 3 to 100. We employ the same procedure 
for the data in Table 5:2, putting the largest negative values at the lowest 
end of the scale. The largest rating is 126 and the smallest rating (largest 
negative value) is —165; the range is therefore —165 to 126.* 


Table 5:2. Bernreuter “Sociability" Scores of 100 College Freshmen 


53 

-103 

-65 

-108 


-31 

-154 

-49 

-31 

-37 

-22 

42 

-56 

126 

-69 


1 

-33 

-93 

-25 

-5 

-77 

-3 

-49 

-87 

-91 


-50 

116 

5 

60 

-30 

-83 

-66 

-113 

-42 

-76 

- 

132 

63 

-95 

-70 

104 

-17 

-137 

24 

-79 

-29 


0 

-53 

-45 

1 

30 

-5 

-78 

-69 

-21 

-37 

- 

106 

-19 

-17 

-14 

-58 

-52 

-94 

-13 

-27 

-8 


43 

-67 

-51 

-120 

-22 

-60 

-87 

-124 

-51 

-97 


39 

-104 

-86 

-93 

-30 

-40 

-165 

-30 

-131 

63 


6 

-39 

-65 

-18 

8 

64 

-45 

-101 

-86 

-41 


-49 

-77 

-57 

-8 

-29 

-3 

Table 

5:3. 

Strong Interest Ratings for * 

'Physician" of 100 

College 

Freshmen 

c+ c 

C 

C 

C 

B-f 

B 

A 

C-f C-h C-h 

B- 

B 

B- B 

B- 

C-f 

C 

C-l- 

C 

c 

C B 

C 

C 

B- 

B B- 

C 

c+ 

C 

c 

B 

c-f 

B- C 

C 

C-f 

c-f 

C B 

B 

c 

C 

c 

C 

B- 

C B 

c 

c-h 

C-f 

B C 

C+ B-f 

C 

c-h 

C 

C 

B-h B 

c 

B- 

C 

A B-h C 

c 

c-h 

c-f 

c 

C 

B- B 

B 

c 

B-f 

c+ c 

C 

C-f- 

c-f- 

c 

c-f 

B 

C C 

c-h 

c-h 

C-f 

c c 

c 

c 

c-f 

B 

c 

C 

B + 





The data in Table 5:3 consists of interest ratings that have been converted 
from numerical scores to broad classifications (hence, categorized) on a letter 
scale, with order of interest pattern represented by the order of the alphabet. 
Most interest is signified by the first letter of the alphabet, A, and kast inter¬ 
est by C. The table is found on inspection to include both A and C ratings; 
the range is therefore C to A. 

The range thus gives the outside limits of the numerical values or ratings 
present in a distribution of variate data. Not only is it a valuable aid in the 
initial steps of constructing the frequency distribution, but it also provides 
an index of the spread, or dispersion, of the scores, i.e., it provides an index of 
the extent to which the measures differ. 


* What “most” and “least” mean on the Bernreuter scale, or any other scale of vari¬ 
able quantities, is ultimately a question of functional analysis; here we are concerned only 
with reducing such data to the form of a frequency distribution. 







THE RANGE AND ARRAY 


101 


The Range as a Comparative Measure 

The range as an index of spread or dispersion of the data of a variable is 
sometimes used to compare the variation of scores or ratings of one group of 
individuals with the variation of another group. If the data for each group 
are for the same variable—are obtained by means of the same test or rating 
device—this comparison procedure may be useful. However, it is obviously 
meaningless to compare the ranges of the three variables in Tables 5:1, 5:2, 
and 5:3. The fact that these ranges are 

3 to 100 (Table 5:1) 

-165 to 126 (Table 5:2) 

C to A (Table 5:3) 

is a summarizing item of information about eacli, but not a comparative 
one. But if a second group of college freshmen shows scores for each of these 
variables with the following ranges: 

6 to 91 
-100 to 53 
C toB 

the differences between this second group of freshmen and the first one sug¬ 
gest that the second group is not as variable as the first in the functions or 
traits differentiated. 

It should be emphasized, however, that we cannot place too much confi¬ 
dence in differences in dispersion measured by the values of the ranges of two 
or more distributions of data, because the range tells nothing about the 
internal organization of the series of scores or ratings. For example, the bulk 
of scores in each distribution being compared may have about the same 
dispersion of measures over the scale, despite different ranges. Furthermore, 
range values are likely to be too erratic or scattered as limiting points of 
groups of data to serve usefully as comparative or even summary measures 
of dispersion for the groups as a whole. Even if the ranges of two groups of 
scores represent the maximum and minimum values that can be obtained on 
a test or rating scale (by virtue of the nature of the method employed in 
scoring the result), there still remains the possibility that the internal struc¬ 
ture of each group of scores is markedly different. 

The Array 

A simple but usually not the most useful or easiest method of organizing 
the data of variables is to rearrange all the scores of each group in order of 
size or order of rating. This constitutes an array. The data in the preceding 
three tables have been rearranged into arrays in Tables 5:4, 5:5, and 5:6. 

Two general characteristics of such arrays are perhaps evident from these 
three tables. (1) The array does not provide a very useful type of organization; 
with large groups of data the result hardly warrants the labor required. (2) The 



102 


THE REDUCTION AND ORGANIZATION OF VARIATE DATA 


Table 5:4. Array of Intelligence Test Table 5:5. Array of Bernreuter 

Scores Scores 


(Data 

in Table 5:1 Rearranged in 
Order of Size) 

(Data 

in Table 5:2 Rearranged in 
Order of Size) 

100 

66 

52 

30 

126 

-13 

-45 

-79 

99 

66 

52 

30 

116 

-14 

-45 

-83 

98 

66 

50 

29 

104 

-17 

-49 

-86 

94 

65 

49 

26 

64 

-17 

-49 

-86 

91 

65 

48 

26 

63 

-18 

-49 

-87 

91 

64 

47 

24 

63 

-19 

-50 

-87 

89 

64 

47 

24 

60 

-21 

-51 

-91 

89 

64 

45 

21 

53 

-22 

-51 

-93 

86 

63 

44 

21 

43 

-22 

-52 

-93 

85 

63 

43 

21 

42 

-25 

-53 

-94 

85 

62 

42 

21 

39 

-27 

-56 

-95 

80 

62 

40 

19 

30 

-29 

-57 

-97 

79 

61 

40 

19 

24 

-29 

-58 

-101 

77 

61 

40 

17 

8 

-30 

-60 

-103 

76 

60 

40 

17 

6 

-30 

-65 

-104 

76 

59 

39 

15 

5 

-30 

-65 

-106 

76 

58 

38 

14 

1 

-31 

-66 

-108 

75 

58 

38 

13 

1 

-31 

-67 

-113 

75 

58 

38 

12 

0 

-33 

-69 

-120 

73 

56 

36 

11 

-3 

-37 

-69 

-124 

73 

56 

34 

11 

-3 

-37 

-70 

-131 

73 

56 

34 

11 

-5 

-39 

-76 

-132 

73 

56 

32 

10 

-5 

-40 

-77 

-137 

71 

53 

31 

8 

-8 

-41 

-77 

-154 

67 

52 

30 

3 

-8 

-42 

-78 

-165 


Table 5:6. Array of Strong Interest Ratings 
(Data in Table 5:3 Rearranged in Order of Rating) 


A 

B 

B- 

B- 

c+ 

c+ 

c 

c 

c 

c 

A 

B 

B- 

B- 

c+ 

c+ 

c 

c 

c 

c 

B-h 

B 

B- 

C-h 

c+ 

c+ 

c 

c 

c 

c 

B + 

B 

B- 

c+ 

c+ 

c+ 

c 

c 

c 

c 

B + 

B 

B- 

c+ 

c+ 

c+ 

c 

c 

c 

c 

B + 

B 

B- 

C -f- 

c+ 

c+ 

c 

c 

c 

c 

B + 

B 

B- 

c+ 

c+ 

c 

c 

c 

c 

c 

B + 

B 

B- 

c+ 

c+ 

c 

c 

c 

c 

c 

B 

B 

B- 

C-f- 

c+ 

c 

c 

c 

c 

c 

B 

B 

B- 

c+ 

CH- 

c 

c 

c 

c 

c 


character of the results, especially in Table 5:6, suggests a method of organiz¬ 
ing variate data that not only is more satisfactory but is also more commonly 
used, viz., the frequency distribution. Since the ratings in Table 5:6 consist of only 
a few letters, the total number of cases in each category of A’s, B+’s, B’s, 
B^’s, C+’s, and C’s can be quickly derived from the array to yield the frequency 
distribution shown in Table 5:7. 

This arrangement obviously summarizes the data in Table 5:3 in a way 
greatly superior to the array shown in Table 5:6. However, as will be seen 







THE FREQUENCY DISTRIBUTION 


103 


Table 5:7. Frequency Distribution of Data in Table 5:6 
(Physician—Strong Interest Inventory) 


Interest Rating 

Frequency 

A 

2 

B-h 

6 

B 

12 

B- 

12 

c+ 

24 

c 

44 


N = 100 


in the next section, the procedures used in deriving this frequency distribution 
involve much more labor and time than is necessary. Instead of first ordering 
all the data into arrays, we can greatly simplify the procedure by tallying the 
original unordered group of data into appropriate classes (or class intervals).* 

C. THE FREQUENCY DISTRIBUTION 

The structure of a group of measures or ratings for a variable is readily 
revealed by the construction of a frequency distribution and a graph of the 
results. Such a procedure requires that the data be tallied into appropriate 
classes or class intervals. 


The Class Interval 

The first problem that ordinarily arises in the construction of a frequency 
distribution involves the selection of appropriate class intervals for the data. 
Sometimes the classes to be used arc evident, and no further consideration is 
necessary. This was true of the data in Tables 5:3 and 5:6, whose frequency 
distribution was shown in Table 5:7. 

More often, however, the class intervals are not so readily indicated by the 
original data. A case in point is the group of intelligence test data in Tables 5:1 
and 5:4, the scores of which had a range of from 3 to 100. The Bernreuter data 
in Tables 5:2 and 5:5, with a range of from —165 to 126, are another example. 
For each group of data of these two variables, there were 100 cases. In order 
to have a frequency distribution that will give a picture of the w hole which 
will be more meaningful and useful than the arrays in Tables 5:4 and 5:5, 
class intervals that include more than one score possibility must be set up. If 
this is not done, and the integral values of all possible scores within the 
range are taken as the class intervals, there will be too many null classes, 

* The subdivisions of a quantitatively distributed variable are usually described as 
clcLSses, or class intervals, whereas the subdivisions of a non-variable, or of a non-quantitative 
variable, are usually described as categories. 



104 


THE REDUCTION AND ORGANIZATION OF VARIATE DATA 


i.e., class intervals with no frequencies. Thus, for the Bernreuter data, there 
are 292 score possibilities, since the range is —165 to 126, and 165 + 126 + 1 
(for zero) equals 292. But there were only 100 cases in the group and there¬ 
fore each possible integral score value could not be represented by a frequency. 

The usual procedure in selecting class intervals for a variable whose range 
is equal to or larger than the number of cases in a fairly good-sized group 
(N equal to or greater than 100) is to establish class intervals of a size that 
will yield from 12 to 20 classes in the frequency distribution. There is, how¬ 
ever, nothing magical in this particular choice. In practice, it is usually not 
necessary to have more than 20 class intervals unless the size of the groups is 
very large. On the other hand, if less than 12 class intervals are used, it may 
be necessary to make certain corrections in the results for computing some 
of the measures used in both descriptive and sampling statistics.* 

Determining the Range or Size of a Class Interval 

The easiest way to establish the range of class intervals so as to have from 
12 to 20 intervals, each equal in size, is to divide the total number of differ¬ 
ent score possibilities (which is equal to the difference between the extreme 
values of the distribution as given by the range, plus one) by 12 or 15 or 20, 
or by any other number between 12 and 20, according to the number of inter¬ 
vals desired. 

In the intelligence test scores in Table 5:1, the total number of score possi¬ 
bilities is 98, since the range of the scores is from 3 to and including 100. If 
a minimum of 12 class intervals is desired, 98 is divided by 12, and a rounded 
value of 8 is obtained. There will therefore be 12 or 13 class intervals with a 
range of 8 score units, each interval equal in size, for a frequency distribution 
of these data. In practice, however, class intervals for integral score values 
are more often taken for convenience as equal to 5 or 10 units. If class intervals 
equal to 10 score units are used for the intelligence test data in Table 5:1, 
there will be only 10 (or 11) classes. On the other hand, if class intervals 
equal to 5 score units are used, there will be 20 (or 21) classes. The choice will 
depend upon the general purpose underlying the statistical treatment of the 
original data in the investigation. If the research worker is mainly concerned 
in developing a frequency distribution to portray the structure of the group 
result as a whole, then the 10 or 11 class intervals of 10 units each will serve 
better than 20 or 21 intervals of 5 units each, since the total number of cases 
is only 100. 

In the case of the Bernreuter data in Tables 5:2 and 5:5, the range was 
found to be —165 to 126, and the total number of different score possibilities 
was 292. Dividing 292 by 15 gives 19.5, and therefore intervals with a range of 
20 score units will give approximately 15 classes. For convenience in notation 


* Cf. Sheppard’s correction for the standard deviation derived from broad classes, 
chap. 7. 



THE FREQUENCY DISTRIBUTION 105 

and tabulation, class intervals with a range of more than 10 units are usually 
set up to the nearest multiple of 5, as for example 15, 20, 25, etc. This is the 
case for intervals up to 25 or 30 units. For intervals of greater size, ranges 
of 50, 75, or 100 units are usually employed. 

The first thing to do, then, in making a frequency distribution is to lay 
out the range of score possibilities in successive class intervals of a convenient 
and appropriate size. If too many class intervals are used for groups of data 
with only 25 or 50 or even 100 cases, there will be too many null classes and 
the picture of the structure of the group as a whole will not be satisfactory. 
Too many class intervals used for the data in Tables 5:1 and 5:2 would be 
little improvement over the arrays shown in Tables 5:4 and 5:5. On the other 
hand, it is apparent that the interest data in Tables 5:3, 5:6, and 5:7 oflFer no 
initial problem of class arrangement, because the total number of different 
ratings is only six and hence there can be no more than six classes. 

The Mathematical Limits of a Class Interval * 

Whatever class intervals may be chosen, it is essential to know the precise 
mathematical limits of the successive class intervals used for a frequency 
distribution of a variable. This problem may not seem to be distinct from 
that of determining the size or range of a class interval. However, as soon 
as one begins to tabulate the original data into their respective class intervals, 
it is likely to become apparent. Furthermore, if any statistical computations 
are to be made from the tabulated data, it is essential to know the mathe¬ 
matical limits of each interval, because tnther the mathematical valuers of 
class-interval limits or the values of interval mid-points are necessary in 
computing most statistical measures of a variable. 

The problem of the mathematical limits of class intervals will be illus¬ 
trated by the tabulation of a group of age data. For convenience, let us assume 
tliat such data are to be tabulated in class intervals of one year each, and 
that the data range from 5 to 15 years. If the highest class interval is set up 
for those cases 15 years of age, the next highest class interval will includt^ 
those 14 years of age, etc. If we assume further that such data form a con¬ 
tinuum or continuous series of age possibilities, and if the first case to b(^ 
tabulated is 14 years and 9 months old, the question arises as to which of these 
two classes shall be used for the tally. The answer depends upon what is taken 
as the class limits of each interval. Sometimes these limits are taken a half 
year below and a half year abovci the integral age value. In this case, the 
mathematical limits of the 15-year class int(*rval will be 14.5 to, but not in¬ 
cluding, 15.5. Similarly, for the 14-year class interval, the limits will be 13.5 
to 14.5, etc. In other words, the mathematical limits of such class intervals 
would be six months preceding and six months following each integral year 

* For a more detailed discussion of this problem, see ,T. G. Peatman, “ On the MeaninK of a 
Test Score in PsycholoKical Measurement,” American Journal of Orthopsychiatry, 9:23-29, 
1939. 



106 THE REDUCTION AND ORGANIZATION OF VARIATE DATA 

value. With these limits, a person 14 years and 9 months of age would be 
tallied in the 15-year interval. 

Age data, however, are often taken in such a way as to make erroneous the 
use of the class limits just described. Thus, if ages are reported as of each 
person’s last birthday, such data will have to be tabulated with respect to 
class intervals whose lower limit is the integral years of birthday age. Those 
reporting an age of 15 years would thus be in a class interval whose limits 
begin at 15.0 years and range to, but do not include, 16.0 years. Similarly, 
an age of 14 years would be in the class interval ranging from 14.0 to 15.0 
years. 

If an investigator wishes to obtain age data which can be correctly tabu¬ 
lated in year intervals ranging from a half year below to a half year above 
the year age, he will ask individuals to give their age in years as of their 
nearest birthday, instead of their age as of their last birthday. Everyone report¬ 
ing his age as 15 years would then be in the 14.5 to 15.5 range. 

A Measure Occupies an Interval Whose Limits Extend Above and Below the 
Value of the Measure 

This problem of establishing the mathematical limits of a class interval 
has been dealt with fairly systematically. However, in statistical literature 
two principles or methods are used. The difference in them can readily be 
illustrated by means of integral class intervals for measures which are them¬ 
selves integral values. Consider, for example, the series of intelligence test 
scores in Table 5:1. If the size of each class interval is taken as 1, then each 
test score will have the range of a class interval. Some authors consider the 
lower limit of such an interval to be the value of the measure itself. Thus, 
the lower limit of the class interval for intelligence test scores of 100 will 
be 100. The upper limit will be 100.99^ (to but not including lOl.O). Other 
authors consider that the measure occupies an interval whose mid-value 
corresponds with the value of the measure itself. In this case, the lower 
limit for intelligence test scores of 100 would be 99.5 and the upper limit 
would be 100.499+ (to but not including 100.5). 

We shall employ the latter interpretation; that is, we shall consider the 
mathematical units of a class interval as equal to a half unit below and a 
half unit above the actual measures in the interval. Thus a reaction-time 
score measured to the one-hundredth of a second occupies a class interval 
that ranges a half hundredth below and a half hundredth above the score 
values: a score of .04 second occupies a unit interval with mathematical 
limits of .035 and .0449+, or from .035 to .045'” (i.e., as a limit). 

This procedure not only is in agreement with general practice but has a 
more logical foundation than the first method. Any measure is subject to 
errors of observation. Such errors are as likely to affect a measure favorably 
(positively) as unfavorably (negatively). Therefore it is logical to interpret 
the measures obtained as lying, on the average, near the middle of unit class 



THE FREQUENCY DISTRIBUTION 107 

intervals. The actual range of error may, of course, extend beyond the lower 
and upper limits of a unit interval. However, regardless of this possibility, 
when the mathematical limits of the interval are taken as below and above 
the value of an observed measure, the results will be more likely to agree 
with the facts than if the mathematical limits extend from unit value to imit 
value. 

It is also to be observed that most of the measurements in psychology and 
related fields of investigation do not have any practical or useful meaning 
unless they are considered in relation to the other measures of the series of 
which they are a part. Since such measurements are relative values anyway, 
a difference of half an integral value makes no real difference. As T. L. Kelley 
pointed out more than twenty years ago in discussing this problem: 

“ Uniformity is needed, and it would be in harmony with well-nigh universal 
procedure in the physical and biological fields to consider a score of 10 as 
being also a class index, or midpoint of an interval. Should this lower the 
grade of a few million school children by one half a point, no harm would be 
done and the great advantage of having the recorded test scores exactly 
those to be used in calculating means, standard deviations, correlations, etc., 
and of having the recorded measures also the class indexes in graphs is 
attained.” * 

Finally, another advantage of this procedure for interpreting the mathe¬ 
matical limits of a class interval is the fact that a psychological test score, 
or any measure, is a value that is to be interpreted as occupying an interval 
rather than as coinciding with the value of a point on a scale of measures. 
An intelligence test score of 100 is to be interpreted as an index that occupies 
a class interval, say from 99.5 to 100.5“, rather than as an index equal to 
the point value of 100. At the same time, for statistical purposes this index 
can be treated algebraically as 100 since 100 represents the mid-value of the 
interval. 

There is only one real exception to the application of this middle-of-the- 
interval interpretation of a measure. The exception is simply staled. If a meas¬ 
ure is originally derived in such a way that its value definitely signifies the 
lower limit of a unit interval, then the limits of the interval should be estab¬ 
lished to fit that fact. As already indicated, age measures as sometimes taken 
prove an exception to the general rule'; that is, when individuals are asked 
to give their age in years as of their last birthday, the data obtained will form 
a series in which integral year values should be used as the lower limits of 
successive class intervals. 

Mathematical vs. Written Interval Limits 

In practice the actual mathematical limits of a class interval are not 
always expressly stated. Whether or not they are explicitly written, they are 
implied in all statistical computations. Thus, if a group of test scores is tabu- 

*T. L. Kelley, Statistical Method, Macmillan, New York, 1923, pp. 12-13. 



108 


THE REDUCTION AND ORGANIZATION OF VARIATE DATA 


lated into class intervals, each with a range of five test score units, the lowest 
class interval is ordinarily written as 10 to 14, the next lowest as 15 to 19, 
etc. However, in line with the interpretation we have indicated, the mathe¬ 
matical limits of these intervals are 9.5 to but not including 14.5, 14.5 to but 
not including 19.5, etc. The use of score values rather than mathematical 
values for denoting class-interval limits has a twofold advantage. (1) The 
written notation itself is simpler; (2) the integral score limits do not suggest, 
as do the mathematical limits, that fractionate values actually occur in the 
test score data. This latter point is not unimportant. Whenever the most 
refined measurements are integral values, it is well to set up class intervals 
for frequency distributions that will not suggest a precision of measurement 
finer than the obtained integral values. 


The Mid-Point Value of a Class Interval 

The problem of determining the mid-value of any class interval is simplified 
by the adoption of a standard interpretation for the mathematical limits of 
class intervals. The mid-point of a class interval is obviously the value exactly 
midway between the mathematical limits of the interval. It can readily be 
obtained as the difference between the mathematical values of the upper 
and lower limits; this difference is divided in half and the result added to the 
lower limit value. Or half of this difference may be subtracted from the 
mathematical value of the upper limit of the interval. An even simpler method 
is to add the mathematical values of the lower and upper limits and divide 
this sum by 2. In other words, the value of the mid-point of an interval is 
equal to the average of the sum of the mathematical limits of the interval. 
Thus, for an interval with mathematical limits 9.5 and 14.5, the mid-point 
value is 


9.5 -h 14.5 24.0 


= 12.0 


The application of the procedures described in the preceding paragraphs 
is illustrated in Table 5:8 and Fig. 5:1 for some commonly used class intervals. 


Table 5:8. Mid-Point Values of Class Intervals of a Size Commonly Used 

for Research Results 


WriWen Class-Interval 
Limits * 

Mathematical Limits 

Unit Size of interval 

Mid-Point Value 

10 

9.5 and 10.5“* 

1 unit 

10.0 

10-11 

9.5 " 11.5- 

2 units 

10.5 

9-11 

8.5 •• 11.5- 

3 units 

10.0 

8-11 

7.5 “ 11.5- 

4 units 

9.5 

10-14 

9.5 •• 14.5- 

5 units 

12.0 

10-19 

9.5 “ 19.5- 

10 units 

14.5 

25-49 

24.5 “ 49.5- 

25 units 

37.0 


* Note that the lower limits of these class intervals are written as values equal to multiples 
of the unit size of the interval, and that class interA'^als of one unit do not have written limits 
since the interval value is the mid-point of the interval. 












THE FREQUENCY DISTRIBUTION 


109 


Fig. 5:1. 


Illustration of Mathematical Limits, Written Limits, and Mid-Point Values of 
Varying Size Class Intervals for Scales of Measures 


Size of 

Intervals 

9.5 10.5 

Ih5 12.5 13.5 14.5 

M.L. (Mathematical Limits) 


ONE Unit 

"T' ^ T 

10 11 

12 Ts 


S.M. (Scale of Measures) 



10 11 

12 13 

14 

W.L 1 

[Written Limits) 



10 11 

12 13 

14 

M-P. (Mid-Point VqIum) 



9.5 

115 13^5 15.5 

M.L. 


TWO Units 

^ 1 Y 1 

10 1 11 

' T ' 
12 i 13 

1 

' T ' 
14 1 15 

1 


5.M. 



lO.ta.ll 

12.10.13 

14..to..l5 


W.L. 



10.5 

12.5 

14.5 


M-P. 



8.5 

11.5 

14.5 


17^5 M.L. 


THREE Units 

1 » 1 

9 10 11 

1 

12 13 

1 

1 1 

14 15 

T 

16 

1 

1 " 

17 S.M. 



9. . t;o.....ii 

12.to. 

....14 15. 

1 

.17 W.L. 



1 

•0 

1*3 


1 

16 

M-P. 


7.5 


11.5 

▲ 

15.5 

_A_ 

19.5 

M.L 

FOUR Units 8 

9 ; 10 11 

12 13 

1 14 15 

- 

16 

17 1 18 19 

1 

S.M. 

8. 

. to .11 

12. 

to..15 

16. 

.to..19 

W.L 


1 

9.5 

13.5 


17.5 

M-P. 


9.5 

A 


14.5 



M.L. 

five Units 

1 1 

10 11 

12 1*3 

1 

1 “ r 
14 15 

1 

16 

T ' ^ 

17 18 19 

1 

S.M. 


10.. 

1 

_to. 

.....14 15. 


....to..19 

W.L 



1 

12 



1> 

M-P. 


The Choice of the Scale Limits of Class Intervals 

Once the size (or unit range) of the class intervals to be used for a fre¬ 
quency distribution has been decided upon, the next step consists in selecting 
the starting point of the highest (or lowest) class interval for the series of 
scores. If the range of a collection of scores is from 3 to 100, as in Table 5:1, 
and if the investigator has decided to use class intervals with a range of ten 
units, should he use a score of 3 to begin the lowest interval, or will some other, 
number be more convenient? The answer is somewhat arbitrary, but the 













no THE REDUCTION AND ORGANIZATION OF VARIATE DATA 

ordinary procedure is usually systematized. If the original data form a series 
that has regularly spaced gaps which yield few or no frequencies, the class- 
interval limits should be chosen so that the frequencies which do occur lie 
near the mid-values of each successive interval. Such a series sometimes occurs 
in percentage school grades because teachers recognize (consciously or uncon¬ 
sciously) that reliable difTerouliations of pupils’ achievement to within one 
per cent are hardly possible, and therefore they record grades at those values 
which are multiples of 5. In such a case, the class-interval limits should be 
chosen so that their mid-point values will be 95%, 90%, 85%, 80%, etc. 

However, if the data of a series of measures are fairly well distributed 
throughout the scale, the lower limit value of each class interval is commonly 
set by taking its written value as a multiple of the size of the interval. Class 
intervals with a range of ten units would, under this procedure, have a written 
lower limit equal to 10 or a multiple of 10 ( 20 , 30, etc.), or zero, if needed. 
Sometimes this systematization of the choice of written limits for class 
intervals is applied to th(^ upper rather than the lower limit; that is, the 
written value of the upper limit of each interval is taken as a multiple of the 
size of the interval. 

Class Intervals for the Intelligence Test Scores in Table 5:1 

If either of the preceding principles is applied to the data in Table 5:1, the 
range of which was from 3 to 100, and if class intervals of ten units are used, 
the written value of the lower limit of the lowest class interval will be zero, 
or the written value of the upper limit of this class interval will be 10, This 
interval will then^forc be taken as 0 to 9 , or as 1 to 10 . Similarly, the written 
value of the lower limit of the highest class interval will be 100 (the highest 
score in the distribution), or the written value of the upper limit will also be 
100. Thus, the highest class interval of ten units will be taken as 100 to 109 
or as 91 to 100 . 

For these data it is better to take the upper (rather than the lower) limit 
of each class interval as a multiple of the size of the interval. If this is not 
done, and if the lower limit values are taken as multiples of ten, then the 
highest class interval will range from 100 to 109 and its mid-value of 104.5 
will be considerably higher than the highest possible score ( 100 ) obtained in 
the distribution in Table 5:1. On the other hand, if the upper limit is used for 
the whole distribution, the class intervals for all score possibilities of the data 
in this table will be as follows; 91 to 100 

81 to 90 
71 to 80 
61 to 70 
51 to 60 
41 to 50 
31 to 40 
21 to 30 
11 to 20 
Ito 10 



THE FREQUENCY DISTRIBUTION 


111 


Class Intervals for the Bernreuter Data in Table 5:2 

Let us now consider the Bernreuter data in Table 5:2. As was previously 
indicated, the range of these data is —165 to 126, and the size of the interval 
to be used will be 20 units if approximately 15 class intervals are desired. If 
the written values of the lower limit of each interval are taken as a multiple 
of 20, the class intervals for all score possibilities will be as follows: 


120 to 

139 

100 to 

119 

80 to 

99 

60 to 

79 

40 to 

59 

20 to 

39 

0 to 

19 

-20 to 

-1 

—40 to 

-21 

— 60 to 

-41 

-80 to 

-61 

-100 to 

-81 

— 120 to 

-101 

-140 to 

-121 

— 160 to 

-141 

-180 to 

-161 


The Tally 

With the class intervals chosen for the data in Tables 5:1 and 5:2, we are 
now ready to obtain the actual distribution of frequencies for each by means 
of a tally of each set of data. The tally procedure is identical with that de¬ 
scribed for categorical data in Chapter 2. 

The simplest method is to check each score as it appears in the table, going 
down the columns or across the rows, and to tally each in its appropriate 
class interval. If the original data are on individual cards rather than in a 
table, each case, in turn, is tallied for the variable under consideration.* 

The first score in Table 5:1 is 49. It will therefore be tallied in the class 
interval, 41 to 50. The next score, going down the column, is 52, and it will 
be in the interval, 51 to 60. Continuing with each score, we have the result 
shown in Table 5:9. The tally for the data in Table 5:2 is presented in 
Table 5:10. 


* Note that when a correlation coefficient is to be computed for the relation between two 
variables, a cross-tabulation yieldinp' a correlation tally is made, instead of separate fre¬ 
quency distributions for each variable. The two frequency distributions are then obtained 
from the correlation tally. (See chap. 9.) 



112 


THE REDUCTION AND ORGANIZATION OF VARIATE DATA 


Table 5:9. Tally of Intelligence Test Scores 
(Data from Table 5:1) 


Class Intarvals 

Tally of Frequencies 

91 to 100 

S 1 

81 to 

90 

□ 

71 to 

80 

s s c 

61 to 

70 

s IS s 

51 to 

60 

H □ C 

41 to 

50 

S □ 

31 to 

40 

S H C 

21 to 

30 

IS S L 

11 to 

20 

IS IS 1 

1 to 

10 

c 

Table 5:10. 

Tally of Bernreuter Scores 

(Data from 

Table 5:2) 

Class Intervals 

Tally of Frequencies 

120 to 

139 

1 

100 to 

119 

L 

80 to 

99 


60 to 

79 

□ 

40 to 

59 

C 

20 to 

39 

C 

0 to 

19 

IS 1 

-20 to 

-1 

SSL 

-40 to 

-21 

S S S L 

—60 to 

-41 

S S S 1 

—80 to 

-61 

SSL 

-100 to 

-81 

S S ( 

-120 to 

-101 

S L 

-140 to 

-121 

□ 

-160 to 

-141 

1 

-180 to 

-161 

1 


The box method of tallying, described briefly in Chapter 2, has been em¬ 
ployed in Tables 5:9 and 5:10, This is in contrast to the older procedure for 
tallying in which each four successive tallies are denoted by vertical lines, 
the fifth tally being denoted by a slanting line through the four lines, thus: 

mm II 

Either method is satisfactory, but the box method is preferable because there 
is likely to be less error when the frequencies for each class interval are counted. 


The Frequency Distribution 

The final frequency distribution is readily obtained from the tally by 
enumerating the number of tallies for each class interval. The distributions for 
the data in Tables 5:1 and 5:2 are presented in Tables 5:11 and 5:12. 



THE HISTOGRAM AND THE FREQUENCY POLYGON 113 

Table 5:11. Frequency Distribution of the Intelligence Test Scores in Table 5:9 


Class intervals 

Frequencies (f) 

91 to 100 

6 

81 to 90 

5 

71 to 80 

13 

61 to 70 

15 

51 to 60 

13 

41 to 50 

9 

31 to 40 

13 

21 to 30 

12 

11 to 20 

11 

1 to 10 

3 


N = 100 


Table 5:12. Frequency Distribution of the Bernreuter Scores in Table 5:10 


Class Intervals 

Frequencies (f) 

120 to 

139 

1 

100 to 

119 

2 

80 to 

99 

0 

60 to 

79 

4 

40 to 

59 

3 

20 to 

39 

3 

0 to 

19 

6 

-20 to 

-1 

12 

—40 to 

-21 

17 

—60 to 

-41 

16 

—80 to 

-61 

12 

-100 to 

-81 

11 

-120 to 

-101 

7 

-140 to 

-121 

4 

-160 to 

-141 

1 

-180 to 

-161 

1 

N = 100 


In published reports the actual tally of original data is rarely included; 
rather, the distribution of a variable is usually presented as in these two 
tables. However, if the page is turned counterclockwise 90 degrees, the tally 
distributions in Tables 5:9 and 5:10 give a rough but graphic picture of the 
structure of each distribution, a pictiu-e not so readily conveyed by the 
frequency distributions in Tables 5:11 and 5:12. 


D. THE HISTOGRAM AND THE FREQUENCY POLYGON 

In order to give a more concrete picture of the structure of a group of 
variable data than is afforded by the frequency distribution alone, the results 
are portrayed graphically. Two types of graphs are used for frequency dis- 



114 THE REDUCTION AND ORGANIZATION OF VARIATE DATA 

tributions; one is the frequency curve (line graph) and the other is the histo¬ 
gram, Each presents about the same picture, except that the curve tends to 
emphasize the continuity and general sweep of a distribution, whereas the 
histogram tends to emphasize distinctions from class interval to class interval. 
Which type of graph should be employed is for the most part a matter of 
personal preference unless one distribution is to be compared with another 
distribution which has already been graphed. In this case, the same type of 
graph should be employed for the second distribution. 

Figs. 5:2 and 5:3 are histograms of the frequency distributions in Tables 5:11 
and 5:12. Fig. 5:4 shows both a histogram and a frequency curve of the 
frequency distribution in Table 5:7. Figs. 5:5 and 5:6 are frequency curves 
of the histograms in Figs. 5:2 and 5:3. 

The Histogram 

The construction of a graph of a frequency distribution is greatly facilitated 
by the use of standard cross-section paper, either millimeter paper or paper 
ruled off in 20 units to the inch. 

The distribution itself is plotted in the frame of reference of a geometric 
field, with two coordinate axes drawn at right angles to each other. The 
abscissa, or horizontal axis, is usually denoted as the x-axis. The ordinate, 
or vertical axis, is denoted as the y-axis. The two axes intersect at the origin. 
In a frequency distribution, the value of the y-axis at the origin is always 
equal to zero. This is because frequencies are always scaled on the vertical or 
y-axis, from zero to a value equal to the maximum frequencies obtained for 
the class intervals of the distribution. The x-axis, on the other hand, repre¬ 
sents the scale of scores, or rating values. Since there is no true zero point in 
psychological scales, the presence of a zero for the x-axis depends on whether 
the particular series of measures being graphed includes a value of zero. The 
intelligence test scores in Fig. 5:2 are scaled from zero to 100, although 
actually there were no zeros in the distribution. 

The Bemreuter scores include not only a zero but also negative numbers. 
This case is typical of the implications of a zero value on psychological scales. 
Zero is a number that serves to identify the position of an individual’s per¬ 
formance or rating in a series. It does not signify a mathematical zero of 
nothing, i.e., in the case of the Bemreuter scale, it does not mean “no 
sociability.” Similarly, the negative numbers in this distribution have no true 
algebraic significance. They serve rather to indicate position on a scale on 
which negative as well as positive numbers happen to be used. According to 
Bernreuter’s Manual for the Personality Inventory from which these scores 
were derived, “Persons scoring high on this scale tend to be non-social, 
solitary, or independent. Those scoring low tend to be sociable and gregari¬ 
ous.” Since any variable scaled on the x-axis should at the least signify for 



THE histogram AND THE FREQUENCY POLYGON 


115 


the attribute or trait an order which ranges from least to most, the scale in 
Fig. 5:3 presumably may be interpreted as ranging from “least non-socia¬ 
bility” to “most non-sociability.” Whether, in fact, such an interpretation is 
justified depends upon an empirical validation of the instrument (cf. Chap¬ 
ter 17, Section C). 


Fig. 5:2. Histogram of Intelli- Fig. 5:3. Histogram of Bem- 

gence Test Scores—Frequency Dis- reuter Scores—Frequency Distribu- 
tribution of Table 5:11 tion of Table 5.12 



Bernreufer Sociability Scores 

Scaling the y~axis and x-axis 

The first problem in (X)nstructing a histogram is to mark off on graph 
paper the lengths of the x-axis and y-axis. There are no basic logical require¬ 
ments to guide one here; what one does is rather a matter of general practice 
and convenience. Since the basic purpose of a histogram is to give a picture 
of the distribution, aesthetic considerations enter into the choice of pro¬ 
cedure. Somewhat balanced proportions for the two scales are desirable. 
The scale on the x-axis is generally made somewhat longer than that on the 
y-axis, so as to give the effect of a figure which rests solidly on its base. If 
the ordinate scale is considerably longer than the abscissa scale, the effect 
is likely to be an unbalanced, top-heavy superstructure. 

In practice, another consideration enters into the scaling of the two axes. 
For distributions of variables which tend to be of the normal bell-shaped 
type (cf. Fig. 1:1), the length of the x- and y-axes is chosen so as to be in a 
proportion of about 3 to . However, the distribution in Fig. 5:2 does not 
resemble a bell-shaped curve. It tends to be more rectangular, no doubt 
because of a relatively constant level of difficulty among items in the intelli¬ 
gence test. The scores of the Bernreuter Inventory in Fig. 5:3 are scaled on 
axes in the proportion of 3| to 2f. This curve is more similar to the bell¬ 
shaped curve than is Fig. 5:2. The distribution of Bernreuter scores is definitely 
uni-modal near the center, and the frequencies decrease above and below the 
center roughly in the bilaterally symmetrical fashion characteristic of the 



116 


THE REDUCTION AND ORGANIZATION OF VARIATE DATA 


Fig. 5:4. Histogram of Strong 
Interest Ratings with Line Graph— 
Frequency Distribution of Table 5:7 



Interest Ratings for Physicians 


normal bell-type distribution. That this 
bilateral decrease is not too marked is 
indicated by a slight piling up of fre¬ 
quencies between —100 and —30 and by 
a proportionate drop in frequencies for 
the upper half of the scale. 

The distribution of Strong Interest 
ratings (for Physician) in Fig. 5:4 bears 
no resemblance to the “normal” dis¬ 
tribution, but tends rather to be of the 
L-type. The two axes have neverthe¬ 
less been scaled in a proportion roughly 
3 to 2. 


Drawing the Histogram 

Having decided upon the lengths to use in scaling the x- and y-axes, we pro¬ 
ceed to draw in the scales on the graph paper and to plot the actual frequencies 
for each class interval. In the histogram a semi-rectangular figure is used to 
indicate the relation between a given class interval and its frequencies. The 
width of the rectangle is equal to the width of the given class interval on the 
score scale. Technically the width should be taken as equal to the mathe¬ 
matical limits of the interval. In practice, however, the written score limits 
are used. Either procedure will of course give the same picture of the distribu¬ 
tion. The only difference will be that the whole histogram will be shifted to 
the left by half a scale unit when the mathematical limits are used. In any 
event, when scores are integral values, as is the case in 5:2 and 5:3, the written 
scale for the x-axis is in terms of the integral values of either the lower or 
upper limits (not both) of successive intervals, rather than in terms of the 
mathematical limits. 

Close inspection of Fig. 5:2 reveals that it has been drawn to the mathe¬ 
matical limits of each class interval. Thus the frequencies of the first interval 
at the lower (left) end of the scale are plotted for limits of 0.5 to 10.5, the 
integral limits having been taken as 1 to 10. 

The height of the horizontal line drawn for each class interval is of course 
determined by the number of frequencies in the interval. If an interval has no 
frequencies, as in one case in Fig. 5:3, the graph of frequencies drops to the 
abscissa and a gap appears in the histogram. 

Two additional points about the histogram should be noted. Tlie first has 
to do with its actual construction. The frequencies of each class interval are 
often represented by a chsed rectangle, as in Fig. 5:4. Whether the rectangles 
are closed, or open as in Figs. 5:2 and 5:3, is to a considerable extent a matter 
of personal preference. However, it is generally preferable not to close them 
in order that the continuity and general structure of the surface of the distri¬ 
bution of the variable will be readily perceived. From a logical point of view, 



117 


THE HISTOGRAM AND THE FREQUENCY POLYGON 

the closed rectangles suggest a non-continuous distribution such as is char¬ 
acteristic of the bar diagrams of categorical data (cf. Figs. 3:4, 3:5, and 3:6). 
Furthermore, in the case of variables the statistical treatment is developed 
on the assumption of a continuous, rather than a discrete or discontinuous, 
series of values. For this reason it is well to emphasize this continuity by not 
closing the rectangles. 

The second point has to do with an assumption that is made when a histo¬ 
gram is used to portray tlie distribution of frequencies of a variable. The 
area under the surface of a histogram is an area of frequencies for the given 
scale of scores. This procedure assumes that the scores are distributed evenly 
throughout a given class interval, hence the horizontal line at the top of each 
semi-rectangle. Although this assumption may not be entirely supported by 
the ungroupcd data for some distributions, it is adopted in the interest of 
statistical convenience. At the same time this assumption does not have to 
be closely fulfilled in practice in order for the results of the statistical treat¬ 
ment to have sufficient validity for ordinary psychological interpretation. 

The Frequency Polygon, or Line Graph 

The frequency distributions of variables are often shown more graphically 
by the frequency curve, or line graph, than by the histogram. A line graph 
has been superimposed on the histogram of the interest ratings in Fig. 5:4. 
Figs. 5:5 and 5:6 are frequency curves for the intelligence test and Bernreuter 
S(‘ores shown by histograms in Figs. 5:2 and 5:3. 


Fig. 5:5. Frequency Curve (or Fig. 5:6. Frequency Curve of 

Line Graph) of Intelligence Test Bernreuter Scores (Data of Table 
Scores of Table 5:11 5:12) 



Bernreuter Sociability Scores 

The graph of a frequency curve is prepared like a histogram except that 
the frequencies for each class interval are represented by points plotted with 
respect to the middle of each interval. The curve is then drawn by connecting 
the plotted points with straight lines. 

We have seen that the mid-point values of class intervals can be obtained 



118 


THE REDUCTION AND ORGANIZATION OF VARIATE DATA 


by fmding the difference between the mathematical limits and adding one- 
half of this difference to the lower limit. It is often desirable to denote the 
score scale on the x-axis in terms of the values of the mid-points of each 
successive class interval rather than in terms of the values of successive 
lower limits. It of course makes no difference which values are denoted on 
the scale so long as the frequencies are plotted correctly in relation to the 
mid-point values of each interval. If the mid-points of class intervals are 
fractionate values, the successive lower integral limits of the class intervals 
are usually employed. 

Comparative Usefulness of the Histogram and Frequency Polygon 

An important advantage of the frequency curve over the histogram is the 
fact that the frequency distributions of several groups of data for a variable 
can be compared on the same graph more readily by means of a line graph 
or frequency curve. When such comparisons are to be made, the curves for 
each distribution shoiild.be clearly diffe^rentiated on the graph by means of 
different types of lines for each—a dotted line, a short bar line, a long bar 
line, as well as the solid line used in Figs. 5:5 and 5:6. When, however, the 
total number of frequencies of the several group results being compared 
differs markedly, line graphs of the percentage frequency distrihulions described 
in the next section are likely to be more satisfactory than line graphs of the 
original frequency distributions. 

The histogram has an advantage over the line graph for series of data that 

Fig. 5:7. Line Graph of Frequencies of “Like” Responses for the Successive Program 

Units of a Radio Broadcast 
Time in Minutes 



Program Units 











THE HISTOGRAM AND THE FREQUENCY POLYGON 


119 


are grouped into class intervals unequal in size. Although such class intervals 
are rarely used for the frequency distributions of variables, they are often 
employed for the frequency data of a time series sequence. The superiority 
of the histogram over the line graph for such data is illustrated by Figs. 5:7 
and 5:8, each of which depicts the trend of the same set of data.* The fre¬ 
quency scale (ordinate) represents the number of listeners expressing favorable 


Fig. 5:8. Histogram of Frequencies of "Like** Responses for the Successive Program 

Units of a Radio Broadcast 
Time in Minutes 



attitudes toward the successive sequences on program units of a radio program. 
The program units of the broadcast are indicated on the horizontal scale 
(abscissa), each scaled according to the amount of broadcast time required 
for it. In studies of audience reaction the parts of a radio program should 
be divided on a functional or meaningful basis, rather than in terms of arbi¬ 
trary and equally spaced time-interval units. 

The line graph in Fig. 5:7 is inadequate because the level of the frequency 
of response for any particular part of the program is not clearly portrayed. 
Furthermore, a rising or declining trend of response within a program unit 
is suggested. Thus, the line graph is likely to imply that there was a higher 
frequency of favorable responses for the beginning and ending of the sixth 
program unit than for the middle of it. Actually, as indicated by the histo¬ 
gram of these same data, the information plotted is for the average number 


* These data were obtained from a Program Analyzer test of a radio broadcast. Cf. J. G. 
Peatman and Tore Hallonquist, The Patterning of Listener Attitudes Toward Radio Broad¬ 
casts: Methods and Results^ Stanford Univ. Press, Stanford, 1945. 









120 THE REDUCTION AND ORGANIZATION OF VARIATE DATA 

of reactions for each sequence as a whole. The rectangular character of the 
histogram indicates this fact and avoids the misleading suggestion given by 
the line graph. In addition, the histogram indicates more clearly the absolute 
as well as the relative length of each program unit. 

E. THE PERCENTAGE FREQUENCY DISTRIBUTION AND POLYGON 

The value of converting the frequencies of two or more distributions of a 
variable into percentage frequencies^ so that the structure of their respective 
distributions may be compared more fairly, is well illustrated by Figs. 5:9 
and 5:10, in which are compared the same two sets of data on the sitting ages 


Fig. 5:9. Comparison of Two Distributions of Infants' Ages (in Months) of Beginning to 

Sit Alone 



of two groups of infants.* The absolute frequencies in Fig. 5:9 yield two fre¬ 
quency curves whose form is apparently rather different—the one, peaked; 
the other, flat. However, as clearly revealed by the percentage frequency 
distributwns in Fig. 5:10, the structure of the two sets of results is very 
similar. That this is the case is seen when the different /V’s of each group are 
taken into account by converting the frequencies of each class interval to 
their percentage of the total frequencies. 

The construction of percentage frequency curves is identical with that of 
ordinary frequency curves except for the scaling of percentage values, instead 
of absolute frequencies, on the ordinate. The percentage values for each class 
interval of a distribution are most readily obtained by first computing to 
two or three decimal places the percentage value of one frequency. The 
product of the number of frequencies per interval and the percentage value 


* Data from J. G. Peatman and R. A. Higgons, “Relation of Infants* Weight and Body 
Build to liocomotor Development,” American Journal of Orthopsychiatry, 12:234-240,1942. 



THE PERCENTAGE CUMULATIVE FREQUENCY DISTRIBUTION 


121 


of a single frequency will give the desired percentage frequency value for the 
interval. Inasmuch as the percentage value of a single frequency is always the 
product of 100 and the reciprocal of N (the total sample), the necessary 


Fig. 5:10. Comparison by Percentage Frequency Distributions of Infants* Sitting 
Ages. (Based on Same Data as Those in Fig. 5:9) 



figures can be obtained from printed tables such as Barlow’s.* Thus, N for 
tlie first sample, as indicated in Fig. 5:9, was 131; the reciprocal of N is 


N 


1 

131 


= .00763 


and 100(.00763) = 0.763, the percentage value of a single frequency. Simi¬ 
larly, for the second distribution, where N = 261: 

Percentage value of a single frequency = 100 

100(|j^-J~t) = 100(.00383) = 0.383% 

For purposes of graphing, it is sufficient to carry out the percentage fre¬ 
quencies of each class interval to only one decimal plac^e. 


F. THE CUMULATIVE AND PERCENTAGE CUMULATIVE 
FREQUENCY DISTRIBUTION 

The Cumulative Frequency Distribution 

The comparison of two or more frequency distributions is also facilitated 
by the use of cumulative frequency distributions, especially percentage cumula¬ 
tive frequency distributions, 

* Barlow's Tables of Squares, Cubes, Square Roots, Cube Roots and Reciprocals of All In¬ 
teger Numbers up to 10,000, Spon, Ltd., London. See also Table I, Appendix C, for re¬ 
ciprocals of all integer numbers up to 1000. 



122 


THE REDUCTION AND ORGANIZATION OF VARIATE DATA 


A cumulative frequency distribution is one for which the frequencies of 
adjoining class intervals, beginning at either end of the distribution, are 
added successively. In other words, the frequencies are cumulative from 
either end of the scale for successive class intervals. Ordinarily, as illus¬ 
trated in the next to last column of Table 5:13, the frequencies are cumulated 
from the lower score values to the higher score values. The sum of the cumu¬ 
lative frequencies for the final class interval should, of course, always be equal 
to the number of cases (iV) in the distribution. 


Table 5:13. Cumulative and Percentage Cumulative Frequency Distributions 

of Infants' Sitting Ages 

(Data from Fig. 5:9) 


Sitting Age in Months 
(Age) (Interval) 

f 

c.f. 

Percentage 

c.f. 

12 

(II.5 to 12.5-) 

1 

131 

}00.0% 

11 

00.5 to 11.5-) 

5 


99.2 

10 

(9.5 to 10.5-) 

6 


95.4 

9 

(8.5 to 9.5-) 

15 


90.8 

8 

(7.5 to 8.5-) 

30 


79.4 

7 

(6.5 to 7.5-) 

36 


56.5 

6 

(5.5 to 6.5-) 

29 


29.0 

5 

(4.5 to 5.5-) 

9 

N 131 

Wm 

6.9 


Percentage value of 1 frequency: 

^(100) = j^OOO) = 0.763% 


In Table 5:13 the frequencies of the sitting-age scores of variable x in 
Fig. 5:9 are cumulated, beginning with the lowest class interval of 5 months. 
The 9 frequencies of this class interval are added to the 29 frequencies of the 
6-month interval, giving a total of 38 cumulated frequencies for these two 
intervals. This signifies that 38 of the 131 infants had sitting-age scores of 
less than 6^ months, since the upper mathematical limit of the 6-month class 
interval is 6.5. To these 38 cumulated frequencies are added the 36 frequencies 
of the next class interval, giving a total of 74 cumulated frequencies up to 
months. This same procedure is carried out for the remaining class inter¬ 
vals, each in turn, the total cumulation equaling 131 frequencies for the 
final class interval, 12-months sitting age. 

The Percentage Cumulative Frequency Distribution 

As the name implies, a percentage cumulative frequency distribution repre¬ 
sents the conversion of the frequencies of a cumulated distribution to per¬ 
centage values, the total number of cases in the distribution being taken as 












THE PERCENTAGE CUMULATIVE FREQUENCY DISTRIBUTION 123 

100%. The cumulative percentage values of each class interval for the sitting- 
age data are given in the last column of Table 5:13. 

The simplest method of converting cumulative frequencies to percentage 
cumulative frequencies is first to compute the percentage value of one fre¬ 
quency for a given distribution. As already indicated for percentage frequency 
distributions, the percentage value of a frequency is always equal to 


where N is the total number of cases in the distribution. The percentage 
value of one frequency for the distribution in Table 5:13 is 0.763%. The 
number of cumulated frequencies for each class interval is therefore multiplied 
by this value in order to secure the percentage values of the cumulated 
frequencies for each class interval. 

If a percentage frequency distribution of a variable is already available, 
the percentage cumulative frequency distribution can of course be obtained 
by directly cumulating the percentage frequency values of successive intervals. 


The Graphic Presentation of a Percentage Cumulative Distribution 

A percentage cumulative frequency distribution is easily graphed and is 
illustrated in Fig. 5:11. The procedure for laying olT the axes is similar to 
that used in graphing 
the percentage frequency Fig 
distributions in Fig. 5:10. 

The vertical or ordinate loo 
scale is laid ofF in per- .g 90 
centages beginning with | so 
zero and ending with 100 , ^ 70 

and the variable is scaled ^ 60 
on the x-axis. Flowever, J 50 
when the frequencies | 40 
have been cumulated 30 
from the lower end of u 20 
the distribution, the per- ^ 10 
centage cumulated fre- 0 
quencies of a given class 
interval are always 
plotted at the upper mathematical limit of the interval, because the cumu¬ 
lated frequencies of a given interval are equal to the sum of all the frequencies 
of the distribution up through that interval. 


. 5:11. Cumulative Per Cent Frequency Curve. 
(Based on the Data in Table 5:13) 



The Ogive 

Sometimes the two scales are reversed. That is, the variable itself is laid off 
on the ordinate and the cumulated percentage frequencies are scaled on the 



124 THE REDUCTION AND ORGANIZATION OF VARIATE DATA 

abscissa. When this is done, the resulting figure is called an ogive. Formerly 
the percentage cumulative frequency distribution was more often scaled to 
give an ogive than the curve in Fig. 5:11. A simple reversal of the axis posi¬ 
tion of the scales does not change the essential character of the graph informa¬ 
tion. However, since it is customary to plot all frequency distributions with 
the frequencies scaled on the ordinate axis, we shall follow this practice for 
the percentage cumulative frequency graph. The result serves research needs 
as effectively as the ogive, which is less convenient to use. 

Usefulness of Percentage Cumulative Graph for Comparing 

Distributions 

The percentage cumulative frequency distribution is of considerable value 
for comparing two or more distributions of a variable—not only as a whole, 
but also at any corresponding points. In Fig. 5:12, the two percentage fre¬ 
quency distributions in Fig. 5:10 have been plotted as percentage cumulative 
frequency curves. Although it is evident from an inspection of Fig. 5:10 that 
there is a tendency for the a:-group of 131 infants to have older sitting ages 
than the y-group of 261 infants, that graph does not provide as satisfactory 
a basis for analyzing detailed differences throughout the range of the two 
distributions as does Fig. 5:12. 

Fig. 5:12. Comparison by Cumulative Percentage Frequency Distributions of Infants’ 
Sitting Ages. (Based on Some Data as Those in Fig. 5:9) 



Fig. 5:12 is useful for comparing either percentage frequencies or sitting ages. 
For comparing percentage frequencies, horizontal lines are projected from 
three points (25%, 50%, and 75%) on the cumulative percentage scale to the 













THE PERCENTAGE CUMULATIVE FREQUENCY DISTRIBUTION 125 

two curves and (k)wn to the corresponding sitting-age values for each group. 
Whereas the first 25% of the y-group had sitting ages of 6 months and less, 
the first 25% of the x-group had sitting ages as great as 6 months and 1 week. 
This tendency for the y-group to have an earlier sitting age is present through¬ 
out the distributions. Thus, 75% of the infants of the y-group were sitting 
alone by 8 months, but all 75% of the x-group had not attained this stage of 
development until about a week later. In fact, the difTerence between the 
two groups consistently averages about \ month sitting age, beginning at 
6 months and continuing through the scale. 

Comparisons can be made, on the other hand, from sitting-age values. For 
example, what percentage of the ar- and y-groups was sitting alone by the 
age of 7 months? This question can be readily answered by projecting a per¬ 
pendicular line from the 7-monlh point on the sitting-age scale to the two 
curves and then across to the corresponding percentage values of each group. 
Only 42% of the ar-group were sitting unaided by 7 months, as compared to 
50% of the y-group. At 9 months, 85% of the ac-group and 90% of the y-group 
were sitting unaided. 


EXERCISES 

The following exercises are based upon the data in Table 5:14, which consists of 
three variables and the results on each variable for two groups of (college students 
(100 in each group) composed of college freshmen ()n the one hand, and their best friends 
(among the freshman group) on the other. 

The scores of each group are given by pairs, so that each college freshman’s results 
are paired with those of his l^est friend. The pairs are numbered in column (1) from 
1 - 100 . 

Variable G (columns 2 and 3) consists of the average grades made during the fresh¬ 
man year by the students in the two groups. (These scores are not the original per¬ 
centage grades, but are measures converted to a scale with a range from zero to 99.) 

The intelligence test variable, I.T. (columns 4 and 5), consists of the scores made 
by the students on an intelligence test administered when they entered college. 

The third variable, A (columns 6 and 7), consists of the age of each student to the 
nearest year at the time of his admission to college. 

1. Determine the range of the results of the three variables for each group of students. 

2. Set up an array for the intelligence test scores of either student group. 

3. Establish class intervals of an appropriate size for each of the three variables. 

4. Differentiate the mathematical limits from the written limits for the class inter¬ 
vals of each of the three variables. 

5. What are the mid-point values of the class intervals set up for each of the three 
variables 

6. Make a tally and frequency distribution of the results of the two groups for each 
of the three variables, and compare the college freshmen’s results on each variable 
with those of their best friends. 

7. Compare the results of the two groups for each of the three variables by means of 
a histogram for the freshmen and a frequency polygon for their best friends. 



126 THE REDUCTION AND ORGANIZATION OF VARIATE DATA 


Table 5:14. Average Grades, Intelligence Test Scores, and Ages of 
100 College Freshmen (F) and of Their Best Friends in the Freshman Class (B) 


0) 

No. 

(2) (3) 

Grad. 

(4) (5) 

I.T. 

(6) (7) 

Ag. 

(1) 

No. 

(2) (3) 

Grad. 

(4) (5) 

I.T. 

(6) (7) 

Ag. 


(F) 

(B) 

(F) 

(B) 

(F) 

(B) 


(F) 

(B) 

(F) 

(B) 

(F) 

(B) 

1 

72 

46 

90 

74 

15 

18 

51 

32 

48 

80 

86 

18 

17 

2 

53 

73 

83 

85 

17 

16 

52 

34 

34 

97 

84 

17 

17 

3 

44 

41 

77 

66 

17 

17 

53 

37 

38 

71 

71 

19 

22 

4 

21 

18 

87 

77 

17 

18 

54 

57 

27 

82 

72 

17 

17 

5 

49 

33 

93 

77 

18 

18 

55 

48 

92 

93 

80 

17 

18 

6 

53 

41 

85 

86 

17 

19 

56 

0 

35 

79 

78 

17 

17 

7 

18 

32 

59 

71 

22 

24 

57 

39 

18 

78 

78 

18 

17 

8 

41 

50 

89 

79 

17 

18 

58 

67 

59 

86 

95 

17 

17 

9 

38 

37 

71 

71 

24 

19 

59 

73 

67 

76 

79 

17 

16 

10 

45 

42 

80 

85 

18 

17 

60 

48 

27 

87 

62 

17 

18 

11 

72 

68 

90 

71 

17 

18 

61 

92 

53 

90 

85 

18 

17 

12 

27 

29 

57 

61 

22 

18 

62 

59 

76 

87 

83 

17 

15 

13 

60 

99 

86 

94 

17 

17 

63 

52 

52 

70 

93 

18 

17 

14 

26 

26 

65 

79 

17 

18 

64 

41 

37 

66 

72 

17 

18 

15 

69 

52 

99 

90 

17 

17 

65 

50 

80 

81 

89 

17 

17 

16 

51 

65 

87 

88 

17 

18 

66 

34 

34 

84 

97 

17 

17 

17 

61 

69 

95 

94 

16 

16 

67 

55 

78 

102 

108 

16 

17 

18 

9 

29 

77 

75 

19 

18 

68 

52 

61 

84 

78 

17 

19 

19 

22 . 

50 

84 

81 

17 

17 

69 

13 

32 

86 

71 

18 

24 

20 

32 

13 

71 

86 

24 

18 

70 

22 

43 

81 

79 

17 

18 

21 

42 

45 

85 

80 

17 

18 

71 

43 

29 

77 

76 

19 

19 

22 

52 

67 

93 

93 

17 

17 

72 

36 

56 

81 

91 

19 

17 

23 

51 

55 

76 

102 

17 

16 

73 

69 

61 

94 

95 

16 

16 

24 

99 

94 

94 

86 

17 

17 

74 

22 

31 

82 

91 

21 

18 

25 

74 

79 

90 

84 

17 

16 

75 

69 

57 

91 

91 

16 

16 

26 

74 

53 

104 

78 

18 

19 

76 

40 

40 

83 

66 

19 

18 

27 

73 

53 

85 

83 

16 

17 

77 

93 

67 

85 

65 

16 

17 

28 

18 

21 

77 

87 

18 

17 

78 

41 

62 

66 

89 

18 

17 

29 

38 

21 

73 

77 

18 

17 

79 

61 

52 

78 

84 

19 

17 

30 

25 

11 

79 

83 

18 

19 

80 

43 

40 

83 

83 

21 

19 

31 

67 

50 

86 

90 

17 

17 

81 

37 

41 

72 

66 

18 

17 

32 

55 

62 

84 

87 

16 

17 

82 

71 

81 

94 

87 

16 

15 

33 

24 

64 

83 

95 

18 

17 

83 

17 

42 

73 

71 

16 

18 

34 

79 

74 

84 

90 

16 

17 

84 

74 

71 

92 

94 

19 

16 

35 

32 

43 

91 

77 

18 

19 

85 

68 

72 

71 

90 

18 

17 

36 

62 

36 

89 

74 

17 

17 

86 

11 

25 

83 

79 

19 

18 

37 

29 

59 

61 

87 

19 

18 

87 

81 

71 

87 

94 

15 

16 

38 

27 

57 

72 

82 

17 

17 

88 

37 

29 

78 

61 

17 

18 

39 

37 

46 

87 

91 

17 

15 

89 

27 

27 

83 

87 

20 

17 

40 

46 

53 

91 

83 

15 

17 

90 

33 

37 

75 

72 

16 

18 

41 

52 

67 

89 

91 

17 

16 

91 

77 

44 

68 

70 

20 

17 

42 

69 

64 

86 

83 

17 

17 

92 

52 

53 

67 

78 

17 

19 

43 

53 

40 

78 

83 

20 

18 

93 

52 

38 

72 

84 

16 

17 

44 

38 

45 

82 

84 

17 

18 

94 

21 

32 

67 

83 

18 

17 

45 

45 

50 

82 

81 

17 

17 

95 

44 

77 

70 

68 

17 

20 

46 

69 

29 

90 

69 

18 

27 

96 

64 

24 

95 

83 

17 

18 

47 

83 

66 

114 

75 

17 

17 

97 

66 

63 

79 

109 

16 

16 

48 

53 

52 

78 

57 

19 

17 

98 

72 

38 

85 

84 

16 

17 

49 

67 

93 

65 

85 

17 

16 

99 

32 

21 

83 

67 

17 

18 

50 

32 

43 

71 

71 

18 

17 

100 

28 

18 

67 

78 

18 

17 


8 . Take the first 50 cases of the freshman group for each of the three variables; make 
a percentage frequency distribution of these sub-groups, and compare and interpret 
the results with those for the total group by means of (a) percentage frequency 
polygons, and (b) cumulative percentage frequency distributions. 






CHAPTER 6 


The Cenfile Point Method for Variate Data 


A. GENTILES AND THE DESCRIPTION OF VARIATE DATA 

A cenlile point is a value on the score scale of a variable such that a given 
percentage of the frequencies of the distribution lies above the given point 
value and the remaining percentage of the frequencies lies below the given 
point value. A complete scale of centile values divides a distribution of fre¬ 
quencies into 100 equal parts, so that the total frequencies are divided into 
successive groups, each of which includes 1% of all the frequencies. A given 
centile point value always means exactly what it says, viz., that a certain 
percentage of the frequencies is located in the distribution above the given 
centile value, the remaining frequencies below. Thus, the 33rd centile of a 
distribution is always a value on the scale of measures such that 67% of the 
frequencies are above this value and 33% are below. 

The purpose of any centile value is to provide a measure of a variable that 
will serve to summarize an aspect of the distribution. The centile method is a 
particularly valuable statistical technique because the basic interpretation 
of centile measures is not limited by the form of the distribution. 

Gentiles were originally called percentileSy and this term is still in con¬ 
siderable use. However, we have abbreviated the term to cent lies for the sake 
of simplicity and of consistency with other commonly used measures which 
are based on the centile method, such as quartiles (never “per quartiles”), 
deciles (not “per deciles”), etc. We shall symbolize a centile point value by C 
and the appropriate numerical subscript to identify the particular value; thus, 
the 33 rd centile point value is symbolized by C 33 ; the 56th centile by Cse, etc. 

The basic assumption in using the centile method for describing the distri¬ 
bution of a variable is that the measures or scores of the distribution form a 
continuous series. This assumption, as we have seen, is inherent in the defini¬ 
tion of a variable; that is, a variable yields a continuous series of measures 
ranging from least to most. Tt should be emphasized that in statistics this 
assumption is followed, even though the actual distribution of frequencies 
may be based on observations that are discrete in character, as for example, 
a distribution that gives for a city the number of children per family. 

Centile Point Values vs. Centile Intervals 

In computing and using centiles to summarize the data of a variable, it is impor¬ 
tant to distinguish clearly between a centile point value and a centile interval. 

127 







128 THE CENTILE POINT METHOD FOR VARIATE DATA 

A Centile Point Value 

A centile point is the value of a point on a scale of measures or scores which 
divider the frequencies of the distribution into two parts such that the sum 
of the two parts is equal to all the frequencies. Like a knife edge, it divides N, 
the total number of frequencies, into two parts. In practice, some frequencies 
of a distribution may have integral values corresponding to integral values of 
centiles. Such frequencies are ordinarily identified with the interval immedi¬ 
ately above the given centile point value. Thus, if C 50 is equal to 76.0, a score 
of 76 is located in the centile interval whose lower limit is 76.0. A frequency at 
the extreme upper range of scores is identified with the centile interval whose 
lower limit is C99, because there can be no frequencies beyond Cioo. 

There are, by definition, 101 centile point values for any distribution. 
These values range from zero (Co) to 100 (Cioo). A full scale of centile point 
values, therefore, divides the frequencies of a distribution into 100 equal 
intervals, or parts. The 50th centile point value (Cso) divides the distribution 
of frequencies into two equal parts such that 50% of them are distributed 
above this centile point value and 50% are below. The 50th centile point value 
is commonly called the median of a distribution. 

A Centile Interval 

A centile interval is, by definition, the range between any two successive 
centile points of a distribution. It consequently includes 1 % of the frequencies. 
The first centile interval of any distribution lies between Co and Ci—in other 
words, between the extreme lower range and the score value of the first centile. 
The 50th centile interval is the score range between C49 and Cso. The 100 th 
centile interval is the range between C99 and Cioo. Any centile interval of any 
distribution thus includes exactly 1 % of the frequencies, although the actual 
score range of such intervals may vary considerably. For example, in a dis¬ 
tribution similar to the normal frequency type, the score range of measures 
for centile intervals near the center of the distribution is much less than the 
range of centile intervals near either extreme of the distribution. This is 
illustrated in Fig. 6 : 1 , in which centile intervals of a bell-shaped distribution 
are compared with centile intervals of a rectangular and a J-type distribution. 

Quartiles, Terciles, Quintiles, Deciles, and Vigintiles 

In practice, several centile point values are used for purposes of summari¬ 
zation. Thus, in addition to the median (C50) already mentioned, C26 and C75 
are used and described as the lower quarlile (Qi) and the upper quartile (Q 3 ) 
points within a distribution (02 is the same as Cso, the median). The range 
between C 25 and C 76 is called the inter-quartile range and always includes the 
middle 50% of the frequencies of any distribution (see Fig. 6:1). Any one of 
the four quartile intervals includes 25% of the frequencies. 

The range from C33 to Cqj is called the inter4ercile range and includes the 



GENTILES AND THE DESCRIPTION OF VARIATE DATA 129 


Fig. 6:1. Gentile Intervals of a Normal Bell-Shaped Distribution Compared vrith 
Similar Intervals of a Rectangular and a J-Type Distribution. (All Three Distributions 
Have Similar Areas.) 



Centile intervals of a normal, bell- 
shaped type of distribution are 
narrow in range at the center of 
the distribution because of the 
great concentration of frequencies 
around the median, or modal point. 



In a rectangular type of distribution, 
all centile intervals are equal in size 
because the frequencies are uniformly 
distributed throughout the scale. 












130 


THE CENTILE POINT METHOD FOR VARIATE DATA 


middle one-third of the frequencies. C33 is ordinarily identified as the lower 
tercile (Ti) and Ce? as the upper tercile (7*2). These tercile points, Ti and T2, 
thus divide a distribution of frequencies into three equal parts, whereas the 
quartile points divide it into four equal parts. Cio and C90 are also commonly 
used as centile point values for the summarization of a distribution. The 
range from Cio to C90 includes the middle 80 % of the frequencies and is called 
the D range (D = C90 ~ Cio). 

Vigintiles, by definition, are the centile point values of a distribution for 
successive intervals that include 5 % of the frequencies. The first vigintile 
point value, Vnu is equal to C5; Vn2 = Cio; Vn^ = Cis; • • • Vn2o = Cioo. 
The first vigintile interval lies between Vno and Vn\ or Co and Cs; the twentieth 
vigintile interval, between Vni^ and F/i2o (or C96 and Cioo). 

Deciles give the point values of a distribution for successive intervals that 
include 10% of the frequencies. The first decile point value, Du is equal to 
Cio; D2 = C20; Dz = C30; • • • Dio = Cioo. The first decile interval lies within 
the range of Do and Di (or Co and Cio) ; the second decile interval is Di to D2 
(or Cio to C20) ; the tenth decile interval is Dg to Dio (or C90 to Cioo). 

Distributions of frequencies are sometimes divided into five equal parts, 
instead of twenty, ten, four, or three; in this case the divisions are known as 
quintile intervals. The first quintile point value, Qnu is at C20; Qn2 = C40; 
• • • Qns = Cioo. The first quintile interval lies within the range of Qno and 
Qni (or Co and C20); the second quintile interval is Qni to Qn2 (or C20 to C40); 
the fifth quintile interval is Qn^ to Qn^ (or Cso to Cioo). 

Tercile, quartile, and quintile divisions of a frequency distribution are 
extensively employed in psychological measurement, particularly in analyzing 
the functional implications of a test. Thus, a test is a useful instrument for 
measurement if a large proportion of those whose test scores are in the upper 
tercile also prove to do successful or satisfactory work in a given job, and if 
those whose test scores are in the lower tercile prove generally to do unsatis¬ 
factory work. 


Summary of Some Commonly Used Centile Measures 


Type 

Number of Intervals 

Measure of Dispersion 

CenHies (C) 

100 

Range, Co to Cioo 

Vigintiles (Vn) 

20 


Deciles (0) 

10 

0 range, Di to Og 

Quintiles (Qn) 

5 


Quartiles (Q) 

4 

inter-quartile range, Qi to Qs 

Terciles (T) 

3 

Inter-tercile range, Ti to T 2 


In addition to these measures, the centile measures of deviation, viz., the 
quartile deviation and the tercile deviation, will be developed later in this 
chapter. 





GENTILES BY THE GRAPHIC METHOD 131 

Comparative Implications of Centile Measures 

Gentile point values and the various statistical measures derived from them 
provide descriptive statistics that are applicable to any kind of variate dis¬ 
tribution. However, caution is necessary in using centile values for sum¬ 
marizing a distribution because these values are computed with respect to 
the distribution of frequencies, rather than with respect to the unit values of 
the scores. Although the 50 th centile interval, for example, always gives the 
range of 1% of the frequencies between C49 and Cso for any distribution, the 
scale values of scores will, as Fig. 6:1 showed, have difTerent implications for 
different forms of distribution. For a normal distribution the score range of 
the 50 th centile interval is relatively small as compared to the range of a 
centile interval at either extreme of the distribution. On the other hand, for a 
J-type distribution, the score range of centile intervals is smaller at the 
extremes than near the middle of the scale. 

It is not misleading to compare the same centile intervals of two distribu¬ 
tions having the same form; however, when two distributions differ markedly 
in form, such comparisons often lead to ambiguous or erroneous interpreta¬ 
tions. This is why it is always well for the investigator to ascertain the forms 
of the distributions with which he is working, in order to know whether, in 
reality, they are comparable. The median, for example, is a measure of 
central tendency only when a distribution shows a central tendency, that is, 
when the greatest concentration of frequencies is near its center. 

The Determination of Gentiles 

In practice, two methods are used for determining any centile value of a 
distribution. One is computational; the other, graphic. Both provide centile 
point values from which the ranges of centfle intervals and other measures 
derived from centiles can be determined. For most practical purposes, the 
graphic method is as satisfactory as the computational method. We shall 
illustrate the graphic method first, since it serves the double purpose of 
yielding the desired centile determinations and of demonstrating what is 
basically involved in the centile method. 

B. CENTILES BY THE GRAPHIC METHOD 
The Centile Graph 

The data in Table 5:13 were used for Fig. 5 : 11 , showing the percentage 
cumulative frequency distribution. These data showed the distribution of the 
sitting ages of a group of 131 infants, i.e., the ages, in months, at which each 
infant was first able to support himself alone in a sitting position for at least 
one minute. Nine were able to do this at 5 months, and all but one could do 
it at the age of 12 months. The same data are used for Fig. 6:2, and from it any 
centile point values of a distribution can readily be estimated. By using the 



132 


THE CENTILE POINT METHOD FOR VARIATE DATA 


centile method for bringing together additional summary facts about these 
data, we can obtain the sitting ages for any proportion of the group, as for 
example the range in sitting ages of the earliest 50 %, or of the middle 50 %, or 
of the last 10 %, etc. In order to arrange the data of the frequency distribution 
in a form convenient for making the centile graph in Fig. 6:2, we obtain the 
cumulative frequencies and the x)ercentage cumulative frequencies by the 
methods described in Chapter 5 . 


Fig. 6:2. The Centile Graph. (Based on the Sitting-Age Data of 131 Infants, Table 

5:13) 



In making a centile graph, the initial procedure is the same as for the per¬ 
centage cumulative frequency curve in Figs. 5:11 and 5 : 12 . In addition, as 
indicated in Fig. 6:2, the scale of centile values is laid off on the ordinate at 
the left side of the chart, with the percentage cumulative frequencies at 
corresponding ordinate points on the right side of the chart. As usual, the 
scale of measures (sitting age) is laid off on the abscissa at the bottom of the 
chart. The mid-point values of the class intervals, rather than the values of 
their limits, are usually noted on this scale. 

In practice, the centile graph is drawn on millimeter paper so as to provide 
many subdivisions on both the centile and the score scales. Fairly accurate 
determinations of centile point values are thereby obtainable, especially if 
the entire graph is scaled on a large sheet of millimeter paper. In fact, with 














133 


GENTILES BY THE GRAPHIC METHOD 

extra large paper and consequently a graph that is very large and carefully 
and accurately drawn, estimates of centile values can be just as accurate for 
any practical purpose as computed values. 

Determining the Score Values of Gentiles from a Centile Graph 

In order to make centile point estimates from a centile graph, the centile 
whose value is sought is first located on the ordinate scale. With this as the 
starting point, the scale value of the given centile is obtained on the abscissa 
by means of a vertical line projected down from a point on the percentage 
cumulative frequency curve, the point being exactly opposite the centile 
point whose value is to be determined. This is illustrated by the horizontal 
and vertical projections on Fig. 6:2. Thus, the position of C90, the 90 th centile 
(or ninth decile), on the ordinate centile scale is projected horizontally to 
the curve and a vertical line is dropped from this point to the scale of measures 
on the abscissa. Each projected line is thus drawn perpendicularly to its respec¬ 
tive scale. The value of C90 is seen to be equal k* 9.4 months sitting age. 
Similarly, the value of C76 is determined from the centile graph as equal to 
8.3 months. 

Table 6:1 brings together the estimates of the seven centile point values 
located on the centile graph in Fig. 6:2. These values enable the median, the 
inter-quartile range, the inter-tercile range, and the D range to be readily 
stated, thus providing useful descriptive information about a variate dis¬ 
tribution. 

Table 6:1. Determination of Centile Values from the Centile Graph in Fig. 6:2 


Centile Point 

Sitting Age 

Cgo (or Dg, the 9th decile) 

9.4 months 

C 75 (or Qs, the 3rd quartile) 

8.3 months 

Cr,7 (or h, the 2nd tercile) 

8.0 months 

Cm) (or Mdn, the median) 

7.3 months 

Css (or h, the 1st tercile) 

6.6 months 

C26 (or Qi, the 1st quartile) 

6.3 months 

Cio (or Du the 1st decile) 

5.6 months 


Median fCso) == 7.3 months 

Inter-quartile range (Cso to Cib) — 6.3 to 8.3 months 

Inter-tercile range (Css to Ce?) === 6.6 to 8.0 months 

D range (Cto to C90) = 5.6 to 9.4 months 


Vigintiles, quintiles, and any other centile values can be readily obtained 
from the centile graph in Fig^ 6:2. If many such measures are needed, however, 
it is well to set up the graph with larger dimensions so that the values can be 
accurately read from it. 

The foregoing procedure for determining the score value of any centile 
point can be reversed to yield the centile interval value for any score or meas- 



134 


THE GENTILE POINT METHOD FOR VARIATE DATA 


ure on the abscissa scale. Thus, a sitting age of 5 months is in the 3 rd centile 
interval; a sitting age of 8 months is in the 67 th centile interval, etc. 

C THE COMPUTATION OF CENTILE VALUES 

The computation of any centile value for a variable involves (1) locating the 
desired centile point value in the distribution of frequencies^ and (2) determin¬ 
ing the score value at the point thus located. 

We shall illustrate these two steps for the data in Fig. 6:2, the original 
distribution of whose frequencies is shown in Table 6:2. As indicated in 
Chapter 5 , the computation of centile values is simplified by using the cumu¬ 
lated frequency distribution. The usual cumulation of frequencies from the 
lower end of the distribution is presented in the next to the last column of 
Table 6:2. The last column shows the frequencies cumulated from the upper 
end in order to simplify checking all centile value computations made from 
the lower end of the distribution. 


Table 6:2. Distribution of the Sitting Ages of 131 Infants (with Cumulative 
Frequency Distribution for Aid in Computing and Checking Centila Values) 


Sitting Age in 
Mon^s 

Class-interval 

Limits 

f 

c.f. 

(from 4.5 Months) 

c.f. 

(from 12.5 Mbnths) 

12 

11.5 to 12.5 

1 

131 

1 

11 

10.5 to 11.5 

5 

130 

6 

10 

9.5 to 10.5 

6 

125 

12 

9 

8.5 to 9.5 

15 

119 

27 

8 

7.5 to 8.5 

30 

104 

57 

7 

6.5 to 7.5 

36 

74 

93 

6 

5.5 to 6.5 

29 

38 

122 

5 

4.5 to 5.5 

9 

N ^ 131 

9 

131 


The Location of a Centile Point 

A given centile point in a distribution of frequencies is located by com¬ 
puting its corresponding percentage of TV, where N is, as usual, the total 
number of frequencies. Thus, C50 is located in the distribution in Table 6:2 
as follows: 

The percentage value of Cso = ®^d AT = 131 . 

Therefore MrCAO = Kl 3 l) = ^ = 65.5 

Cso is consequently located as a point value such that 65.5 of the frequencies 
are above this point and 65.5 are below this point. Frequencies are often 
fractionated in order to compute a centile point value (the distribution is 
assumed to be continuous). 

We now need to determine the score value of the point value which is at 
the limit of the 65 . 5 th frequency in the distribution of 131 frequencies. 














THE COMPUTATION OF CENTILE VALUES 


135 


Inspection of the cumulated frequencies in Table 6:2 reveals that this point 
value is in the class interval 6.5 to 7.5 months. This is the case since the 
upper limit of this interval includes 74 cases cumulated from the lower end 
of the distribution, whereas the upper limit of the preceding class interval 
(5.5 to 6.5 months) includes only 38 cases. Having thus located the class 
interval which contains the desired centile point value, we next proceed to 
interpolate its score value in the interval. 

Interpolating the Score Value of a Centile Point 

The proportion of the frequencies in the class interval, 6.5 to 7.5 months, 
that will include exactly 65.5 of the cumulated frequencies must first be 
determined. There are 36 frequencies in the class interval in which the 
65.5th frequency is located; hence, the desired proportion is equal to 

65.5 - 38 27.5 

-^6—- IT - 

where 38 is the total number of frequencies below the class interval 6.5 to 
7.5 months. 

This result, .76, represents the proportion of the frequencies in the lower 
part of the seven-month class interval which, together with the 38 frequencies 
cumulated through the six-month interval, will give a total of 65.5 frequencies. 
There are 38 frequencies below the seven-month interval. In order to reach 
65.5, 27.5 more are needed. Since the seven-month interval has a total of 
27 5 

36 frequencies, gives the proper proportion of them, namely, .76. 

36 

The score value of Cso, thus located within the interval, will be equal to the 
lower mathematical limit of the seven-month interval, plus .76 of the range 
of the interval. Since the unit size of the interval is one month of sitting age, 

Cfio = 6.5 + .76(1.0) = 7.26, or 7.3 months 

The value of Cso for this distribution is therefore 7.3 months sitting age. This 
is also the median value, since, as we have seen, the median of a distribution 
is located at C 50 . 

In interpolating a score value for a centile point, it is assumed that the 
frequencies are uniformly distributed throughout the interval in which the 
point is located. 

An alternative procedure for the interpolation is as follows: The score 
value of one frequency, for the interval in which the centile point value is 
located, is first computed. In the above example, the score value of one 
frequency in the seven-month interval is equal to 



136 


THE CENTILE POINT METHOD FOR VARIATE DATA 


where rti equals the number of frequencies in the interval and i is the unit size 
of the class interval—in this case, 1 month. 

The score value of each frequency in this interval is therefore equal to .028 
month of sitting age. Having already determined that the first 27.5 frequencies 
of this interval are needed in order to arrive at a point which will include the 
lowest 65.5 frequencies of the whole distribution, we therefore multiply these 
27.5 frequencies by .028 and add the result to the lower mathematical limit 
value of the seven-month class interval. Thus, 

Cfio = 6.5 + 27.5(.028) = 7.27, or 7.3 months 

This value should of course be the same, except for dropped decimals, as 
that obtained with the first interpolation procedure. 

From the preceding development, the formula for the centile may be slated 
as follows: When the frequencies are cumulated from the low-score end of a 
distribution: 

Cc = Xi + i :1] 

\ fi / Any centile C 

where Xi is equal to the mathematical value of the lower limit of the interval in 
which the desired centile value is located; p is the proportion of the distribution 
needed for any particular centile value, as for example, p = when C 33 
is desired; N is the total number of frequencies in the distribution;/6 is the 
number of frequencies below the lower limit, Xi; fi is the number of frequencies 
in the interval in which the centile is located; and i is the size of the class 
interval. 


Checking the Computed Centile Value 

It is important in statistical work nol only to check computations, but, if 
possible, to employ a method of checking which is relatively independent of 
the particular steps used in making the original computations. Such a method 
is readily found by working from the upper end of the distribution. The 
centile point value of Cso will be located so as to include the same number of 
frequencies from the upper end of the distribution as from the lower end, 
since the value of Cso is located so as to divide the frequencies into halves. 

Inspection of the last column of Table 6:2, in which the 131 frequencies have 
been cumulated from the upper end of the distribution to facilitate checking, 
reveals that 57 of the cumulated frequencies lie above the lower limit of the 
eight-month class interval. We need, therefore, to go to the next lower interval, 
viz., seven months, to find the point value for the 8.5 additional frequencies 
(65.5 — 57 = 8.5) needed to give a total of 65.5. The score value of this 
point is then interpolated as before. Thus, 


65.5 - 57 
36 


8 ^ 

36 


= .24 



137 


THE COMPUTATION OF GENTILE VALUES 


This interpolated value of .24 is multiplied by the range value of the interval 
and the product is subtracted from the upper mathematical limit of the seven- 
month interval in order to give the value of C 50 on the score scale. Thus, 


C 50 = 7.5 — .24(1.0) = 7.26, or 7.3 months 

If the second method of interpolation is used, we again determine the 
percentage value of one frequency in the seven-month interval and obtain the 
folio wine: 

Cfio = 7.5 - 8.5(.028) = 7.26, or 7.3 months 


The computations for the other six centile values previously estimated 
from the centile graph in Fig. 6:2 are presented in Table 6 : 3 . It will be ob¬ 
served that the principles of computation just described for Cso apply for any 
centile value. Thus, the score value of C 90 is determined by first locating its 
position in the distribution of frequencies: 9/10 of 131 = 117.9. Hence the 
value of C 90 is such as to divide the frequencies into two parts, with 117.9 
below C 90 and the remainder (131 — 117.9 = 13.1) above C 90 . The value of 
C 90 is next found to be located (from the cumulative frequency distribution 
in Table 6 : 2 ) in the nine-month class interval (8.5 to 9.5). This interval has 
15 frequencies and the interpolated value of C 90 is computed to be 9.4 months 
of sitting age. 


Table 6:3. The Computation of Centile Point Values 
(For the Sitting-Age Data in Table 6:2) 


c 


Division of Frequencies 
(Cumulated from the 
Lower End of Distribution) 


Location 

of 

Interval 


Interpolated Value 
Within Class Interval 


Value 
of Centile 
(in Months) 


Cm 

C76 

Cot 

Coo 

C 33 

C25 

Cio 


il 031) = 117.9 
75 


100 

67 


4 

67 


(131) = 98.25 


,00-'' = i^n31)= 87.77 

^N= 1(131) =65.5 

33 33 

*^N = -^(131) = 43.23 

100 100' ' 


25 


— N* -(131)= 32.75 
10 .. 1 




100 


10 ’ 


13.1 


B.5-9.5- 

7.5- 8.5- 

7.5- B.5- 

6.5- 7.S- 

6.5- 7.5- 

5.5- 6.5- 

5.5- 6.5^ 


117.9 - 

104 

15 


98.25 - 

74 

30 


87.77 - 

74 

30 


65.5 - 

38 

36 


43.23 - 

38 

36 


32.75 - 

- 9 

29 


13.1 - 

9 


29 


= .93(1.0) = .93 
= .81(1.0) == .81 
= .46(1.0) = .46 
= .76(1.0) = .76 
= .15(1.0) = .15 
= .82(1.0) = .82 
= .14(1.0) = .14 


8.5 + .93 = 9.43 

[or 9.4] 

7.5 4- .81 = 8.31 

[or 8.3] 

7.5 + .46 = 7.96 

[or 8.0] 

6.5 + .76 = 7.26 

[or 7.3] 

6.5 + .15 = 6.65 

[or 6.6] 

5.5 4- .82 = 6.32 

[or 6.3] 

5.5 4- .14 = 5.64 
[or 5.6] 


The procedure for checking any centile is also based on the same principles 
as were described for checking Cso. Thus, to check C90, this centile is located 
from the upper end of the distribution. The point from this end that will 














138 


THE CENTILE POINT METHOD FOR VARIATE DATA 


exactly correspond with C 90 , taken from the lower end of the distribution, 
will of course be 10 per cent of the way down in the distribution of frequencies. 
Thus, 10/100 of 131 = 13.1, and this frequency is seen (from the last column 
in Table 6 : 2 ) to be located in the interval 8.5 to 9.5 months. There are 12 fre¬ 
quencies above the upper limits of this interval, and 15 within the interval. 
Hence, interpolating for C90 and subtraciing the result from the upper limit 
of the iMerval, we have: 

Cm = 9.5 - — (10) = 9.5 - .07 = 9.43, or 9.4 months 


The formula for checking any centile value, with the frequencies cumulated 
from the high-score end of the distribution, is as follows: 




[6:lo: 
Any centile (check 
formula) 


where Xu is equal to the mathematical value of the upper limit of the interval 
in which the desired centile value is located;/« is the number of frequencies 
above the upper limit, Xu; and p, N, /<, and i are the same as for Formula 6 : 1 . 


Comparison of Estimated and Computed Centile Values 

If centile values are estimated from a carefully drawn centile graph, the 
results should be the same, or at least approximately the same, as computed 
centile values. In Table 6:4 the results as estimated from the centile graph in 
Fig. 6:2 are compared with the computed centile values in Table 6:3. It will 
be observed that the estimated and computed values in Table 6:4 are the 
same in all cases to within one-tenth of a month sitting age. For all practical 
purposes the results are “identical.” 

Table 6:4. Comparison of Estimated and Computed Centile Values for 
the Sitting-Age Data of 131 Infants 


Cantile Values 

Sitting Age 

Estimates from 
Centile Graph 

(Fig. 6:2, Table 6:1) 

Computed Values 

(Table 6:3) 

Cm Ds 

9.4 months 

9.4 months 

Cts Qa 

8.3 •• 

8.3 " 

C67 Ta 

8.0 •• 

8.0 •* 

Cm Mdn 

7.3 " 

7.3 •• 

Cas Ti 

6.6 

6.6 •• 

Cas Qi 

6.3 •• 

6.3 •• 

Cio Oi 

5.6 •• 

5.6 •• 


When many centile values are to be determined for a distribution, the 
centile graph is thus a labor-saving device that can yield as satisfactory 






CENTILE MEASURES 


139 


results as computed values. Furthermore, it has an advantage over the 
computational procedure in that it affords a picture of the trend of the results. 
Finally, once a centile graph is made, any centile values which may be needed 
later for comparative or other purposes can readily be read from the graph. 

D. CENTILE MEASURES 

The data of a variate distribution can be summarized by various measures 
based on point centile values. These centile measures, which have already 
been referred to as the median, terciles, quartiles, quintiles, deciles, and 
vigintiles, will now be described more fully. 

The Median (A Measure of Central Tendency?) 

The median, by definition,* is a centile point value in a scale of scores or 
measures such that the total distribution of frequencies is divided into two 
equal parts at that point. In other words, 50% of the frequencies are above 
the score value of the median and 50% are below it. The median always 
signifies exactly this, and is therefore equal to Cso (the value of the 50th 
centile point). Furthermore, for variables which show a uni-modal tendency 
near the center of the distribution, the median serves as a useful measure of 
central tendency. 

Sometimes variables yield distributions which are uni-modal but at the 
same time skewed; that is, instead of being bilaterally symmetrical from the 
modal part of the distribution, the scores spread out much farther at one 
end than at the other. In such cases the median, as a measure of central 
tendency, usually provides a more typical score than does the arithmetic 
mean. This point will be developed further in the next chapter. 

The formula for the median, which states the operations needed to deter¬ 
mine the value of Cso, and is a special case of Formula 6:1, is as follows: 

( ^^^2_ 

—r -"j * Median {Mdn), spetual 

' case of Formula 6:1 

where Xi is equal to the mathematical value of the lower limit of the interval 
in which the median is located; N is the total number of frequencies in the 
distribution; ft is the number of frequencies below the lower limit, Xi; fi is 
the number of frequencies in the interval in which the median is located; and 
i is the size of the class interval. 


* The median is also sometimes defined as the mid-score or mid-case of an array of meas¬ 
ures. Although this concept provides a quick measure of the median when N is odd, it 
does not fit the point value concept of centiles as developed in this chapter. For example, 
it would lead to difficulties, in dealing with the data of distributions which are assumed to 
be continuous, to derive centiles, vigintiles, deciles, etc., in terms of the mid-score concept 
of the median. Hence we shall not be concerned with this concept; rather we shall compute 
the median, the value of Cso, as any other centile measure is computed. 



140 


THE CENTILE POINT METHOD FOR VARIATE DATA 


The D Range—Measure of Dispersion 


The pairs of centile point values which are ordinarily used together to 
summarize the dispersion or spread of scores in a distribution were shown in 
Fig. 6:2 by similar types of vertical lines projected from the curve to the 
sitting-age scale. Thus, Di (or Cio) and D 9 (or Cm) are used to give the D range. 
It includes the range of the middle 80 per cent of the frequencies: 


D = Cm — Cio 


[ 6 : 2 ] 

D range 


As was indicated in Table 6 : 1 , the D range for the data in Fig. 6:2 was found 
to be 5.7 to 9.4. Thus, the middle 80% of the distribution of 131 infants have 
sitting ages ranging from 5.7 to 9.4 months. 2) == 9.4 — 5.7 = 3.7 months. 


The Quartile Deviation—Measure of Deviation or Variability 

C 26 and C75 have long been used to give the range of the middle 50% of the 
frequencies of a distribution. Since this is the range between the first and 
third quartile points {Qi and Qa), it is usually called the inter-quartile range. 
It is ordinarily an even more stable part of a distribution than the D range. 

When distributions tend to be bilaterally symmetrical with respect to the 
median, the inter-quartile range provides the basis for a very useful measure 
of variability, viz., Q.D., the quartile deviation. Q.D. is equal to one-half the 
inter-quartile range: 

O.D. = ~ ^25 [6:3] 

. ’ ' 2 2 Quartile deviation (Q.D.) 

The quartile deviation for the data in Fig. 6:2 is computed as follows: 


(?.D. = 




2 


1.0 month 


For this distribution the median (Cso) happens to be exactly midway on the 
score scale between the values of the first and third quartiles (C 2 B and Ctb). As 
indicated in Table 6:1, the median is 7.3, whereas the first quartile is 6.3 
and the third quartile is 8.3. Consequently, in using the median as a point of 
reference to summarize the deviational tendency of this distribution of sitting- 
age data, we are warranted in stating the following: 

Mdn ± Q.D. = 7.3 ± 1.0 ^ 6.3 to 8.3 months 

The quartile deviation itself gives the range of 25% of the frequencies above 
or below this median, but the median plus and minus the quartile deviation 
gives exactly the range of the middle 50% of the cases. This relation of Mdn 
to Q.D. is of course misleading if the point values of the inter-quartile range 
are not bilaterally symmetrical with re8X)ect to the median. The use of the 
quartile deviation with the median to summarize the deviational tendency 
of a distribution is sound only when the median plus and minus Q.D. gives a 
range in score values that corresponds closely to the actual point values of 



GENTILE MEASURES 


141 


the inter-quartile range as given directly by C26 and Ctb. Therefore, when 
distributions do not tend to be bilaterally symmetrical with respect to the 
median, the dispersion of the middle 50% of the frequencies should generally 
be summarized by citing the actual values of C 2 B and Ctb rather than by 
computing the quartile deviation and using it with the median. 

The Tercile Deviation—Measure of Deviation or Variability 

It is often useful to divide distributions of frequencies into three equal 
parts; hence, the tercile range. The first tercile interval is the range from the 
lowest score values of a distribution to the point value of C33. This latter centile 
value is the first tercile point value (Ti); the second tercile point (T^ is at 
Ce?; and the third tercile value corresponds to Cioo. The inter-tercile range 
is equal to the middle tercile interval T\ to (or C33 to Ce?) and for the data 
in Fig. 6:2 is 6.6 to 8.0. Hence, the middle 33% of the distribution of measures 
for 131 infants range from 6.6 to 8.0 months of sitting age. 

A measure of deviation analogous to the quartile deviation has been also 
developed for the tercile range. The principle for its computation is the same; 
hence T.Z)., the tercile deviation, may be symbolized as follows: 

* 2 2 Tercile deviation (T.D.) 

The tercile deviation is thus equal to half the inter-tercile range, just as the 
quartile deviation is equal to half the inter-quartile range. For the sitting-age 
data in Fig. 6:2, 

r.D. =---= = *70 month 

2 2 

The tercile deviation gives the range of one-sixth of the frequencies above or 
below the median, provided, of course, the value of the median is midway 
between the point values of T\ and T*. When this is the (;ase, the median plus 
and minus T.O. gives the range of the middle one-third of the measures of a 
distribution. For the data in Fig. 6:2, 

Mdn ± T,D. = 7.3 ± .70 = 6.6 to 8.0 months 

These values correspond to the actual C33 and Ce? point values; consequently, 
it is appropriate to use the median as a point of reference for a tercile measure 
of deviation to describe the results of this particular distribution. 

Tercile divisions of a distribution have in recent years been employed by 
some investigators in order to set up a threefold differentiation of criterion 
scores used in the validity analysis of test results: average group (Ti to T 2 ), 
above average group (T 2 and above), and below average group (below Ti). 
For such a purpose this is often a better division than quartiles in which an 
“average” group is taken as lying within the limits of the inter-quartile 
range (Qi to Qs). 



142 


THE CENTILE POINT METHOD FOR VARIATE DATA 


E. THE USE OF GENTILES FOR COMPARING THE RESULTS 
OF TWO OR MORE DISTRIBUTIONS OF A VARIABLE 

Gentiles are valuable not only for describing and summarizing important 
details of a distribution but also for comparing two or more distributions of a 
variable. For the latter purpose, either graphs or tables (or both) may be 
employed. 

We shall present both a graph and a table to compare teachers* salaries as 
distributed for three school systems, X, Y, and Z, located in areas suburban 
to New York City.* The salary distributions for each of these systems are 
presented in Table 6:5. The number of teachers reported for school system X 
was 164; for school system Y, 151; for school system Z, 114. Both the original 


Table 6:5. Distributions of Teachers’ Salaries for Three Public School 
Systems in New York State 


Annual Salaries 

School System X 
f cJ. % c.f. 

School System Y 
f c.f. % c.f. 

School System Z 
f c.f. % c.f. 

$4301 to $4500 







3 

114 

100.0 

$4101 to $4300 







4 

111 

97.4 

$3901 to $4100 




9 

151 

100.0 

3 

107 

93.9 

$3701 to $3900 




11 

142 

94.7 

18 

104 

91.2 

$3501 to $3700 

1 

164 

100.0 

15 

131 

87.3 

12 

86 

75.4 

$3301 to $3500 

16 

163 

99.4 

10 

116 

77.3 

18 

74 

64.9 

$3101 to $3300 

19 

147 

89.6 

23 

106 

70.7 

16 

56 

49.1 

$2901 to $3100 

8 

128 

78.0 

27 

83 

55.3 

9 

40 

35.1 

$2701 to $2900 

22 

120 

73.2 

19 

56 

37.3 

8 

31 

27.2 

$2501 to $2700 

57 

98 

59.8 

12 

37 

24.7 

16 

23 

20.2 

$2301 to $2500 

17 

41 

25.0 

10 

25 

16.7 

1 

7 

6.1 

$2101 to $2300 

13 

24 

14.6 

6 

15 

10.0 

5 

6 

5.3 

$1901 to $2100 

4 

11 

6.7 

8 

9 

6.0 

1 

1 

0.8 

$1701 to $1900 

3 

7 

4.3 

1 

1 

0.7 




$1501 to $1700 

3 

4 

2.4 







$1301 to $1500 

1 

1 

0.6 







. ... 

N = 164 



N = 151 



N = 114 




frequency distributions and the percentage cumulative frequency distributions 
are given in this table. These data provide the basis for the centile graphs 
shown in Fig. 6:3. We shall proceed to interpret the results of these graphs 
and then present a tabular comparison of teachers’ salaries for each of the 
three school systems (Table 6:6). 

The variable, teachers’ salaries, is scaled at the bottom of Fig. 6:3; they 
range from $1300 to $4500 per year. The percentage cumulative frequency 


* These data for teachers* salaries are based on figures assembled for the school year 
1943-1944, and made available by Mr. Vernon G. Smith, Superintendent of Schools, 
Scarsdale, New York. 











THE USE OF GENTILES 


143 


curves for each of the three schools are plotted according to the procedure 
already described for centile graphs. 

Fig. 6:3. Comparison of Teachers' Salaries for Three School Systems in Areas Subur¬ 
ban to New York City for 1943-1944/ Principals’ Salaries Excluded 


Taackers* Annual Salaries 



Only the upper and lower tercile values for each of the three school systems 
are shown in the figure. However, centile graphs are vtjry convenient for the 
comparison of two or more distributions at any point thereof. Once the 
graphs are made, they can be referred to at any time, and any desired values 
quickly derived. Furthermore, they can be read from either direction. That 
is, the perctuitage of teachers above or below a certain salary, as well as the 
salary value of any given centile point, can be readily determined. For ex¬ 
ample, Fig. 6:3 reveals that none of the teachers in system X received an 
annual salary in ex(*ess of $3700, and, on the other hand, that none of the 
teachers in system Z were paid less than $1900 a year. 

The median salaries for each system arc readily obtained by projecting a 
horizontal line from the 50% point on the ordinate scale across to each of the 
three curves and dropping vertical lines at the three points of intersection to 
the base line. These medians are approximately $2640, $3040, and $3300, 
for X, Y, and Z respectively. The difference between the median salaries of 
X and Z is considerable, amounting to $660. However, there are also con¬ 
siderable differences at most points throughout the three distributions. Thus, 
examination of the lower and upper tercile values reveals that two-thirds of 





144 


THE CENTILE POINT METHOD FOR VARIATE DATA 


tlie teachers in system X received an annual salary of less than |2810 (T^), 
whereas more than two-thirds of the teachers in both systems Y and Z received 
annual salaries in excess of $2810. One-third of the teachers in system Y 
received salaries greater than $3260 iT 2 ), and one-third of the teachers in 
system Z received salaries greater than $3540 {T 2 ). On the other hand, one- 
third of the teachers in system X received salaries of less than $2540 (Ti), 
whereas only 18% of the teachers in system Y and only 9% of the teachers in 
system Z received less than this amount. 

Inspection of the slopes of the centile curves for each of the school systems 
is especially revealing. The steeper the slope, the less the dispersion or spread 
of salaries and the greater the concentration of cases. Thus, for school sys¬ 
tem X, the slope of the curve is very steep for salaries between $2500 and 
$2700. Within these relatively narrow limits, the salaries of 35% of the teachers 
in this system are located, since the centile point value of a salary of $2500 
is approximately C 26 , and the centile point value of a salary of $2700 is approxi¬ 
mately Cfio. The dilTerence between Cm and C 26 is equivalent to 35% of the 
cases. On the other hand, there is no such sharp and extended rise in the 
slope of the curves for sys^ms Y and Z. This means that the salaries of the 
teachers in these systems were spread much more evenly through the scale 
than were the salaries paid in X. 

As for the tabular comparisons: Four sets of centile values, commonly used 
for comparative purposes, are presented in Table 6:6. All have been deter¬ 
mined directly from the centile graphs in Fig. 6:3, rather than computed from 
the frequency distributions shown in Table 6:5. They are the D range, the 
inter-tercile range, the inter-quartile range, and, finally, the median with the 
quartile and tercile deviations.‘Not all of these four sets of data are necessary 
for the tabular comparison of two or more distributions. Often only the 
median and quartile deviations are reported. Whether or not the D range and 
the inter-tercile range are also used depends on the nature of the data being 
compared, as well as upon the detail necessary for the report. 

The D range, as we have seen, gives the dispersion of the middle 80% of 
the cases in a distribution. In all three school systems compared in Table 6:6, 
this range is greater than $1000, being nearly $1500 for the teachers in 
system Y. It will be observed, furthermore, that there is a difference of $600 
between the upper limits (C90) of the D range for school systems X and Z. 
Ten per cent of the teachers in system Z received salaries in excess of $3900, 
whereas the upper 10% in system X were paid between $3300 (C90) and $3700 
(upper limit). 

The inter-tercile range was referred to in the description of Fig. 6:3. The 
dispersion of the middle one-third of the teachers’ salaries varies from $270 
for X to $500 for Z. Two-thirds of the teachers in system X received salaries 
of less than $2810 .{T 2 ), whereas two-thirds of the teachers in system Z 
received salaries of more than $3040 (Ti). 

The inter-quartile range for system X is $500, being the same as the inter- 



tHE USE OF CENTILES 


145 


Table 6!6. Comparison of Teachers' Salaries for Three Public 
School Systems 

(Determined from Centile Graphs in Fig. 6:3) 


Centila Values 

School Systems 

Salaries 

Y 

Salaries 

Z 

Salaries 

Dg (Cgo) 

$3300 

$3775 

$3900 

D, fC,o) 

$2175 

$2300 

$2550 

0 range 

$2175 to $3300 

$2300 to $3775 

$2550 to $3900 


= $1125 

= $1475 

= $1350 

h (Cn) 

$2810 

$3260 

$3540 

T\ fCsai 

$2540 

$2840 

$3040 

Inter-tercile range 

$2540 to $2810 

$2840 to $3260 

$3040 to $3540 


= $ 270 

= $ 420 

= $ 500 

03 fCrJ 

$3000 

$3425 

$3700 

Ql (C26) 

$2500 

$2725 

$2840 

Inter-quartile range 

$2500 to $3000 

$2725 to $3425 

$2840 to $3700 


= $ 500 

= $ 700 

= $ 860 

Median (C&o) 

$2640 

$3040 

$3300 

0 .0. 

$ 250 

$ 350 

$ 430 

T.D. 

$ 135 

$ 210 

$ 250 


lercile range of tlie salaries in system Z. The middle 50% of the teachers’ 
salaries for the latter system varied from $2840 to $3700, a range of $860. 
Seventy-five per cent of the teachers in system X were paid less than $3000 
(Qs), whereas 75% of the teachers of system Z received salaries of more than 
$2840 ((?i). 

The median salaries in Table 6:6 indicate that half the teachers in system X 
received salaries of less than $2640, whereas half the teachers in system Y 
received salaries in excess of $3040, and half those in system Z received salaries 
in excess of $3300. 

The values of the quartile deviations and terclle deviations are equal to 
half the inter-quartile and inter-tercile ranges in the upper part of the table. 
If the medians for each of the distributions are approximately midway 
between their respective upper and lower quartile and tercile points, these 
ranges can be tersely summarized in terms of quartile deviation and tercile 
deviation. Let us examine first the results for school system X. 

The median ± the quartile deviation of system X is equal to $2640 ± $250, 
which gives a range of $2390 to $2890. The actual lower and upper quartile 
point values for this distribution are $2500 and $3000. The median is not 
midway between these quartile points, and hence the results can be described 
more accurately in terms of the actual quartile points rather than in terms 
of the quartile deviation. 








146 


THE CENTILE POINT METHOD FOR VARIATE DATA 


In system Y, the median ± the quartile deviation is equal to $3040 ± $350, 
which gives a range of $2690 to $3390. Since the actual quartile point values 
for this distribution are $2725 and $3425, the use of the latter, rather than 
the quartile deviation, to describe the result is again preferable. 

In system Z, the median ± the quartile deviation is equal to $3300 ± $430, 
which gives a range of $2870 to $3730. The lower and upper quartile point 
values for this distribution are $2840 and $3700. Although system Z thus 
gives the closest correspondence of values, the quartile point values, rather 
than the quartile deviation, would be preferred in describing the results of 
this distribution. 

The above procedure for describing the results, in which the three distri¬ 
butions are compared in terms of their medians and quartile deviations, is less 
satisfactory than a direct comparison of the quartile points and medians. 
This is true because the distributions are not bilaterally symmetrical with 
respect to their medians. On the other hand, inspection of the tercile deviations 
indicates that these measures satisfactorily describe the variability of the middle 
one-third of the cases in each distribution, with respect to the medians of 
each. Thus, the median ± the tcrcile deviation for system X is equal to 
$2640 ± $135. This gives a range of $2505 to $2775. The actual lower and 
upper tercile values are $2540 and $2810. In school system Y, the median ± 
the tercile deviation is equal to $3040 ± $210, which gives a range of $2830 
to $3250. These values practically coincide with the actual lower and upper 
tercile values of $2840 and $3260. In system Z, the median ± the tercile devia¬ 
tion is equal to $3300 ± $250. This gives a range of from $3050 to $3550. 
The actual lower and upper tercile values are again practically the same, viz., 
$3040 and $3540. 

Thus, the tercile deviation taken in relation to its median provides a satis¬ 
factory description of the mid-variability of each distribution, whereas for 
these particular distributions the quartile deviations are less satisfactory 
than the quartile points themselves. It is therefore apparent that quartile 
deviations and tercile deviations should not be used to summarize and com¬ 
pare the variability of distributions unless examination of the results shows 
that the median plus and minus either of these measures of deviation gives 
ranges that closely correspond to the actual quartile or tercile point values. 


F. THE USE OF THE CENTILE METHOD FOR COMPARING 
THE RESULTS OF TWO OR MORE VARIABLES 

The comparisons of teachers’ salaries in tliree different school systems by 
means of centile graphs illustrate the detail in which the centile method 
can be used to compare two or more different distributions of the same 
variable. The same method is also useful for comparing two or more different 
variables, provided of course there is a logical basis for considering them 



U7 


THE USE OF THE GENTILE METHOD 

together. The procedure is aptly illustrated by the following market research 
data on two aspects of people’s habits, viz., rising and breakfosting.* 

Two samples totaling “6705 representative families” of New York City 
were personally interviewed by field interviewers of Crossley, Inc., for radio 
Station WOR of New York. About one-half of the total group was asked: 

“What was the earliest time last Sunday that a member of your 
family was up?” 

The other half was asked: 

“What time did the family have breakfast last Sunday?” 

The results of this market survey are summarized in Table 6:7, and the 
behavior of the two groups, each for its respective variable, is compared by 
the centile graphs in Fig. 6:4. The data arc somewhat surprising, for it had 
been thought that most New Yorkers sleep much later Sunday mornings. 


Fig. 6:4. "When New Yorkers Get Up on Sunday Mornings” f 



Broadcasting Magazine. 


Tlie survey results indicate that the median first-riser of a family is up 
before 8 o’clock; two-thirds of the first-risers are up by 8:30; and 4 out of 
every 5 are up by 9 o’clock. Furthermore, by 8:30, half the families had eaten 
or were eating breakfast, and by 9 o’clock two-thirds of them had eaten or 
were eating. Less than .5% arose or breakfasted after 11 o’clock. 

* Ray Lyon, “ New Yorkers Early Risers,” Broadcasting Magazine, March 26,1945, p. 34. 


148 


THE GENTILE POINT METHOD FOR VARIATE DATA 


Table 6:7. “When New Yorkers Get Up on Sunday Mornings” 


Time of Morning 

First Person Up 

f 

Family Breakfast 
f 

First Person Up 
% c.f. 

Family Breakfast 
% c.f. 

12:30 and later 

9 

20 

100 % 

100 % 

12 x00 noon 

74 

59 

99.7 

99.4 

11:30 

9 

34 

97.5 

97.7 

11:00 

93 

261 

97.2 

96.7 

10:30 

48 

98 

94.3 

89.0 

10:00 

287 

510 

92.9 

86.2 

9:30 

109 

160 

84.2 

71.2 

9:00 

465 

565 

80.9 

66.5 

8:30 

328 

490 

66.8 

49.9 

8:00 

653 

690 

56.9 

35.5 

7:30 

460 

219 

37.1 

1 15.3 

7:00 

423 

217 

23.1 

8.9 

6:30 

121 

46 

10.3 

2.5 

6:00 

164 

39 

6.6 

1.1 

5:30 A.M. and earlier 

54 

0 

1.6 

0.0 


N = 3297 

3408 

100 % 

100 % 


The steepest part of tlie “family breakfast” curve in Fig. 6:4 would be the 
optimum time for breakfast radio programs, for this is the period when the 
greatest number is eating breakfast. It begins at 7:30 and lasts until 9. 


EXERCISES 

1. Why is the assumption of a continuously distributed variable essential to the 
use of the centile point method? 

2. What are the implications of centile point values and (rentile intervals? 

3. For any distribution, are there more frequencies between and Cps than be¬ 
tween C 45 and Cbfi? Why? 

4. What are the centile point limits of: 

a. the inter-quartile range 

b. the inter-tercile range 

c. the 8th decile interval 

d. the 3rd quartile interval 

e. the 3rd tercile interval 

f. the D range 

g. the range 

h. the 12th vigintile interval 

i. the 1st quintile interval 

5. What propMjrtion of the frequencies of a distribution lie within the limits of: 

a. the D range 

b. the inter-(iuartile range 

c. the inter-tercile range 











THE USE OF THE GENTILE METHOD 


149 


6. Under what circumstances can and cannot a distribution be adequately sum* 
marized by: 

a. the median ± the quartile deviation 

b. the median ± the tercile deviation 

7. In what centile interval do the following measures lie: 

a. the median 

b. the lower quartile (Qi) 

c. the upper tercile (To) 

d. the 8th decile (Da) 

8. For each of the variables in Table 6:7, summarize and interpret the results in 
terms of the following centile measures, lM)th by estimates from a centilt* graph 
and by computed centile values: 

a. median 

b. tercile deviation 
quartile deviation 

(1. the D range 

9. Determine whether the variation characteristics of each variable in Table 6:7 can 
be adequately summarized by: 

a. the median ± the quartile deviation 

b. the median ± the tercile deviation 

10. Compare each of the three variables in Table 5:14 in terms of centile values 
derived from a centile graph, and interpret the results for the following: 

a. average grades of college freshmen and of their best friends 

b. intelligence test scores of college freshmen and of their best friends 

c. ages of college freshmen and of their best friends 



CHAPTER 7 


The Mean and Standard Deviation 


A. THE METHOD OF MOMENTS FOR VARIATE DATA 

We shall present in this chapter another statisti(^al method that is widely 
used for the summarization and comparison of variate data. In contrast to 
the centile method developed in the preceding cliapter, the statistical meas¬ 
ures now to be discussed are based on deviations from the arithmetic mean 
of the distribution rather than on the location of frequencies at various 
points on a scale of measures. 

The chief measures to be obtained are the arithmetic mean (M) and the 
standard deviation (<r). The method of their computation is essentially 
algebraic and is often described as the method of moments.* 

Both the arithmetic mean and the standard deviation are widely used in 
sampling and analytical statistics, as well as in descriptive statistics. Another 
measure of deviation is the average deviation (A,D,), Since it is not used so 
frequently, it will be described, for reference purposes, at the end of this 
chapter. 


Basic Symbols 

Henceforth, it will be convenient to employ the following commonly used 
symbols for the basic measures and procedures of the method of moments: 

1. A measure or score of a variate is symbolized by the capital letter X. Tf 
two or more distributions are compared, the measures of each are sym¬ 
bolized by different capital letters, for example, by X, Y, and Z, or A, B, C, 

Z. 

2. Any particular measures of a variate are symbolized by numerical sub¬ 
scripts to the capital letter, as for example, Xu X 2 , • • • Xn. 

* The arithmetic mean of a distribution is a moment of the first order, and a standard 
deviation is the square root of the moment of the second order taken with respect to the 
mean. 

The term “moment” as used in statistics has been taken from physics, where a moment 
is a measure of a force with respect to the tendency of the force to produce rotation. The 
strength of this tendency varies according to the amount of force and the distance from the 
point at which the force is applied. In a frequency distribution, the arithmetic mean is 
taken as the origin, and the frequencies of each class interval are taken as the forces at dis¬ 
tances in terms of Xu X 2 , Xs. • . x„. The mean is the moment of the first order; it is sym¬ 
bolized by Ml and is equal to 2/x/iV. The moment of the second order is symbolized by m 2 
and is equal to 2 /( 2 ?)/TV. 


150 






THE MEAN 


151 


3. The arithmetic summation of a series of measures is symbolized by S. 

4. The algebraic summation of a series of measures is symbolized by the 
Greek letter for capital *S, viz., 2. 

5. The mean is symbolized by M. The median, as we have seen, is repre¬ 
sented by Mdn. The mode (point or interval with the greatest concentra¬ 
tion of frequencies) is symbolized by Mo. Thus these three symbols are 
clearly differentiated from one another. 

6. A deviation is symbolized by the small letter corresponding to the capital 

letter used for the original measures or scores. Thus, x symbolizes the 
deviate value of A; y, the deviate value of Y. • 

7. Generally, and unless specifically indicated otherwise, a deviation, x, is 
always taken as the difference between an original measure, X, and the 
mean of the distribution from which the measure is derived. Hence, 

x = X - 

Deviations of measures greater in value than the mean are positive; of 
measures less in value, negative. The latter are symbolized by a minus 
sign. 

8. Any particular deviations of a variate are symbolized by numerical sub¬ 
scripts, as for example, Xi, X 2 • • • Xn. 

9. Small letters are also used to symbolize a variable. Thus, a single variable 
is usually designated as variable x; a second variable as variable y; a 
third, as variable z. If there are more than three variables (but less than 27) 
in an investigation, they may be symbolized by letters from the beginning 
of the alphabet: variable a, variable b, etc., or each variable may be num¬ 
bered in succession and designated as variable 1, variable 2, • • • etc. 

Unfortunately, the symbols of statistics are not universally uniform. How¬ 
ever, the preceding symbols are commonly used in statistical work developed 
for and applied to the biological and social sciences. 

B. THE MEAN 
Definition 

The simplest and most common definition of the mean is that it is the sum 
of all the measures of a distribution divided by the total number (N) of 
measures. The mean is an average, and is often referred to as the arithmetic 
mean to distinguish it from other averages. However, the sum of all the 
measures is basically an algebraic rather than an arithmetical sum. Either 
method of summing will, of course, yield the same result if all the measures of 
a distribution are positive (or if all are negative) numbers, but not if the 
distributions include both negative and positive numbers. 

The mathematical definition of the mean is as follows: A mean is a number 
such that the algebraic sum of the deviations of all measures from that number 



152 


THE MEAN AND STANDARD DEVIATION 


is equal to zero. It is a measure for a distribution such that the sum of the 
positive deviations exactly equals the sum of the negative deviations. Thus, 
the mean of the following three numbers, 10, 6, and —10, is equal to: 

Xi X2 Xt 10 6 4 - ("" 10 ) 6 ^ ^ 

N 3 “ 3 

This result yields a number, 2.0, such that the algebraic sum of the deviations 
of each measure from 2.0 is zero. Thus, where x = X — M*: 

afi 4" 5C2 4" = (10 — 2 ) 4" (6 — 2 ) 4" ("“10 — 2 ) = 8 4" 4 4" (“12) = 0 

The mean is a measure that summarizes an essential aspect, the average, 
of a variable distribution. It is one of the most important measures in statis¬ 
tical theory, for it provides the investigator with a number which represents 
the average value or size of all the different measures of a distribution. The 
mean is a measure of the central tendency of distributions that tend to be of the 
bell-shaped type. Since the frequencies are more concentrated near the cen¬ 
tral part of a bell-shaped distribution than at any other part, the mean is an 
index of the most typical measure of such a series. However, the mean is a 
satisfactory measure of central tendency only when the distribution of 
measures from which it is obtained tends to be uni-modal and bilaterally 
symmetrical, i.e., with a concentration of measures around the mid-point of 
the distribution and with a decrease in the measures above and below the 
mid-point. Obviously, the mean is not a measure of central tendency for 
distributions that tend to be U-shaped or J-shaped or rectangular (see Fig. 6:1) 
and hence have no central tendency. Nevertheless, from a mathematical point 
of view, the mean is a general measure and can be used to obtain a summary 
statistic for any kind of distribution. 

The point to be remembered is that most statistical measures are obtained 
not only for the purpose of summarizing certain facts about a distribution of 
measures, but also for comparative purposes. The mean of one distribut ion 
is compared with that of another; but unless the two distributions are similar 
in form, the comparisons are likely to be misleading. To compare the mean 
of a J-type distribution with the mean of a normal-type distribution is an 
example; only for the latter would the mean be a measure of central tendency 
and represent the most typical score. But it would not be misleading to com¬ 
pare the arithmetic means of two J-shaped distributions provided it were 
clear that in such cases the mean is not a measure of central tendency or 
representative of the typical score. The possible implications of the mean are 
usually clarified by knowing the form of the distribution from which it is 
derived. It is because so many variables in the biological and social sciences 
yield uni-modal distributions of the bell-shaped type that the mean is often 
described as a measure of central tendency. 

In practice, there are three methods for computing the mean: 

I. A long method, with the data not organized into a frequency dis¬ 
tribution. 



THE MEAN 


153 


II. A long method, with the data organized into a frequency distribution. 

III. A short method, with the data grouped as for Method II. 

Method I: The Mean from Unordered Data 

When an adding machine is available. Method I is the simplest of the three 
methods, because it involves merely the summation of all the measures of a 
distribution and the division of the sum by the tc)tal number of measures. 
However, this method has a disadvantage in that it is often difficult, if not 
impossible, to ascertain the form of a distribution, especially one with a 
large range, unless the data are tallied into a frequency distribution with 
class intervals greater than the original units of measurements. In other 
words, it is often necessary to group the data into the class intervals of a 
frequency distribution in order to determine whether, in fact, the variable 
manifests a central tendency. Unless the original data of a group of measure¬ 
ments are ordered into a frequency distribution, inappropriate measures may 
be used to describe and summarize the results. Wc have already emphasized 
the importance, in statistical practice, of first describing the data of a vari¬ 
able by a frequency distribution. Methods II and III have therefore been 
developed for this arrangement of the data. 

The data presented in Table 7:1 were obtained from 30 subjects in an ex¬ 
periment on personal tempo. Each measurement represents the metronome 
rate that each subject judged was the tempo he most preferred. There are 
thus 30 measures, one for each subject. The data in the table are arranged in 
three columns and are in the order in which tlicy were originally obtained. 

Table 7:1. Arithmetic Mean—Long Method with Ungrouped Data 
(Data: Preferred Metronome Rates as Judged by Subjects in an Experiment 

on Personal Tempo) * 


Subject No. 

Score 


Score 

Subject No. 

Score 

1 

146 

B8 

72 

21 

126 

2 

180 


72 

22 

176 

3 

60 


126 

23 

112 

4 

104 

14 

152 

24 

120 

5 

108 

15 

116 

25 

122 

6 

132 

16 

144 

26 

96 

7 

152 

17 

172 

27 

120 

8 

116 

18 

126 

28 

132 

9 

76 

19 

130 

29 

108 

10 

180 

20 

150 

30 

104 


S = 3730, N = 30 

3730 

Arithmetic Mean = ■■ ■ = 124.3 

30 


* These fiji^iires represent, the tempo of the metronome beat most preferred by each of 
30 subjects. The scores are the numbers of the usual metronome scale. (From Honors Re¬ 
search Project in Psychology at I’he City College of New York, by Bernard Steinzor.) 












154 


THE MEAN AND STANDARD DEVIATION 


The sum of the 30 measures is 3730. The mean of this group of measures 
is therefore 3730 divided by 30 (the number of measures), or 124.3. Thus, 

M = = 124.3 

This method of computing the mean may be symbolized as follows: 

M = Arithmetic mean (M), 

from ungrouped data 

where is the sum of all the measures and N is the total number of measures. 

Method I has the advantage of quick computation when the total number 
of measures is small or when an adding machine is available. This method is 
widely employed in machine computations. However, in using it, the inves¬ 
tigator should set up, independently, a frequency distribution so that he will 
know the implications of the average he obtains. Even careful study of 
Table 7:1 does not reveal at all clearly whether the mean, 124.3, is typical, in 
the sense that many of the 30 cases cluster around it. 


Method II: The Mean—Long Method with Data Grouped 
into a Frequency Distribution 

The data in Table 7:1 have been rearranged in Table 7:2 into a frequency 
distribution with class intervals of 20 units. There are seven class intervals, 
and their frequencies vary from 1 to 9. Since the measures tend to be con¬ 
centrated near the middle intervals of the distribution, the mean in this case 
is a measure of central tendency. 


Table 7:2. Arithmetic Mean—Long Method with Grouped Data 
(Data from Table 7:1) 


Class Intervals 

f 

MId-Pt. 

f(Mid-Pt.) 

180 and above 

2 

189.5 

379.0 

160-179 

2 

169.5 

339.0 

140-159 

5 


747.5 

120-139 

9 


1165.5 

100-119 

7 


766.5 

80- 99 

1 

89.5 

89.5 

60- 79 

4 

69.5 

278.0 


N = 30 


S = 3765.0 


Range 60 to 180,- N = 30; S = 3765 

3765.0 

Arithmetic Mean = —— = 125.5 
30 


The computation of the mean by Method II is illustrated in Table 7:2. 
The procedure may be summarized as follows: 

1. Set up appropriate class intervals and make a frequency distribution of the 
original data. 
















THE MEAN 


155 


2. Determine the mid-point values of each class interval. 

3. Multiply the mid-point value of each class interval by the number of fre¬ 
quencies or cases in the interval. (These products appear in the last column 
of the table.) 

4. Obtain the algebraic sum of all these products. 

5. Divide this sum by the number of measures (iV). The quotient thus 
obtained is the mean. 

As indicated in the table, the sum is 3765.0. This value divided by 30 gives 
a mean equal to 125.5. The procedure for Method II may be symbolized as 
follows: [7’2] 

_ ^(fXmp) ^ ^(fX) Arithmetic mean (M), 

N ’ N from data grouped into 

class intervals 

where Xmp is the value of the mid-point of each class interval, and / is the 
number of frequencies. 

The means obtained by the two methods have different values although 
they arc derived from the same data. With Method I, the mean is 124.3, 
whereas with Method II it is slightly larger, 125.5. Such a difference is to be 
expected; in fact, it would be most unlikely for the two means to have exactly 
the same value. This is the case because Method II assumes that the mid¬ 
point value of any class interval is a representative value for all the scores 
in that interval. Obvicmsly there will always be (4ass intervals for which the 
mid-point values do not coincide with an exact average of the measures in 
the interval. But for large distributions the assumption is practical, and such 
discrepancies between the mean valuers obtained with the two methods are 
not likely to be serious. However, for distributions witli few frequencies, siz¬ 
able differences may occur if only a few broad class intervals are used. 

From a mathematical point of view, the mean obtained by Method I might 
be described as more exact than that obtained by Method II. However, from 
the point of view of sampling statistics, a sample mean obtained by Method II 
is likely to be just as representative of the population mean as the mean 
obtained by Method I; it may even be more representative. It is because of 
this consideration in particular that discrepancies in the results obtained with 
the two methods are usually judged to have little or no importance. 

Method III: The Mean—Short Method >vith Grouped Data 

Once a frequency distribution of the data of a variable has been made. 
Method HI is the easiest method of computing the mean. It is called a short 
method becausd* the arithmetical computations are considerably simplified, 
which means that the method not only requires less time but also is less 
subject to computational errors than is Method H. 

The principles underlying the short method can best be described by means 
of Tables 7:3, 7:4, and 7:5. 



156 


THE MEAN AND STANDARD DEVIATION 


A common device is employed in Table 7:3 to simplify the computations, 
namely, the subtraction of a constant amount from all measures. Since the 
lower limit of the class interval with the smallest value is 60, we could sub¬ 
tract 60 from the mid-point of each class interval in the distribution. However, 
the procedure is simplified even more if we subtract 69.5 from such mid- 

Table 7;3. Arithmetic Mean—First Step in the Short Method 
(Data from Table 7:2) 


Clast Intervals 

t 

Mtd-Pt. 

(Mid-Pt.) - 69.5 

f(Mid-Pt. - 69.5) 

180 and above 

2 

189.5 

120 

240 

160-179 

2 

169.5 

100 

200 

140-159 

5 

149.5 

80 

400 

120-139 

9 

129.5 

60 

540 

100-119 

7 

109.5 

40 

280 

80- 99 

1 

89.5 

20 

20 

60- 79 

4 

69.5 

0 

0 


N = 30 



S = 1680 


1680 

Arithmetic Mean = -f 69.5 = 56.0 + 69.5 = 125.5 


points, since 69.5 is the mid-point value of the lowest interval. The differ¬ 
ences between the mid-point values of each interval and 69.5 are given 
in the fourth column of the table. Since they vary only from 120 to zero, they 
are obviously simpler to work with than the original mid-point values. The 
products of these residual mid-point values and the frequencies for eac^h 
interval are shown in the last column. 

The sum of the products of the last column is 1680. Dividing this by 30, 
the number of cases, gives a mean of 56.0. This is the mean of the values in 
the distribution of measures after 69.5 was subtracted from the mid-point value 
of each class interval. The mean of the original measures will consequently 
be obtained by adding 69.5 to the result. Thus, 

56.0 -h 69.5 = 125.5 

Inasmuch as the frequency distribution used in Tables 7:2 and 7:3 had 
identical class intervals, the means obtained by Methods II and III should 
be and are equal. 

The procedure illustrated in Table 7:3 can be further simplified by dividing 
the residual mid-point values shown in the fourth column by a constant. 
Since all the mid-point values are multiples of 20 (the size of the class interval), 
the constant to be used is obviously 20. The quotients resulting from this 
division are given in the fifth column of Table 7:4. These new mid-point 
values are now in the simplest arithmetic terms; they range from 6 to zero. 
They are then multiplied by the number of frequencies or cases in the respec- 



















THE MEAN 


157 


live class intervals. The products obtained appear in the last column of 
Table 7:4. The sum of these products, 84, is then divided by 30, the number 
of cases. The result, 2.8, is an average, being the mean of the measures of the 
distribution with 69.5 subtracted from the mid-point value of each interval 
and with the reduced mid-point values divided by 20. 


Table 7:4 Arithmetic Mean—Short Method Continued 
(Data from Table 7:2) 


Class Intervals 

f 

Mid-Pt. 

(Mid-P».) - 69.5 

Mid-Pt. - 69.5 

20 


180 and above 

2 

189.5 

120 

6 

12 

160-179 

2 

169.5 

100 

5 

10 

140-159 

5 

149.5 

80 

4 

20 

120-139 

9 

129.5 

60 

3 

27 

100-119 

7 

109.5 

40 

2 

14 

80- 99 

1 

89.5 

20 

1 

1 

60- 79 

4 

69.5 

0 

0 

0 


N = 30 




2 = 84 


Arithmetic Mean = 20 + 69.5 = 125.5 


The values derived from this simplified procedure now need to be compen¬ 
sated for by multiplying in what was excluded by division, and by adding 
what was subtracted. Since 20 was taken as a constant divisor, the mean of 
the sum of the products in the last column, 2.8, is multiplied by 20. This 
gives 56.0, and to it, as in Table 7:3, is added 69.5 (the amount subtracted). 
This gives a mean of 125.5 for the distribution, a value which coincides (as 
it should) with that obtained in Table 7:3. 

The procedure illustrated in Table 7:4 is basically the simplest method for 
the data of frequency distributions. The procedure used in Table 7:5 is essen¬ 
tially the same as the preceding, except that the amount subtracted is taken 
nearer the center of the distribution rather than at one end. Although in the 
present example this procedure does not materially simplify the computa¬ 
tions, the labor involved in computing is usually greatly reduced when the 
distributions have 15 or 20 or more class intervals. 

The short method used in Table 7:5 is usually described as follows: 

1. The amount subtracted is called the guessed mean and is symbolized by 

G.M. 

2. The divisor is always taken as equal to the size of the class interval and is 
symbolized by i. 

3. The deviations (mid-points) for each class interval are symbolized by a?' 
after the subtraction and division of the preceding two steps. 











158 


THE MEAN AND STANDARD DEVIATION 


4. As usual, frequencies are symbolized by /, and the number of cases in the 

distribution by N, 

It makes no difference in the result whether the guessed mean (the constant 
which is subtracted) is taken exactly at the middle class interval of the dis¬ 
tribution. We shall see that the procedure used in Table 7:4, with the guessed 
mean (69.5) taken at the lowest class interval, yields a result identical with 
that obtained when the guessed mean is taken at the middle interval of the 
distribution. The most practical procedure is to take it near the middle of the 
distribution and at a class interval with a great many frequencies. In Table 7:5 


Table 7:5. Arithmetic Mean—Short Method Continued 
(Data from Table 7:2) 


Class Intervals 

f 

Mid-Pt. 

(Mid-Pt.) - G.M. 

, Mid-Pt. - G.M. 
* ■ 1 

fx' 

180 and above 

2 

189.5 

60 

3 

6 

160-179 

2 

169.5 

40 

2 

4 

140-159 

5 

149.5 

20 

1 

5 

120-139 


129.5 

0 

0 

0 

100-119 


109.5 

-20 

-1 

-7 

80- 99 


89.5 

-40 

-2 

-2 

60-79 


69.5 

-60 

-3 

-12 


N = 30 




S =-6 


G.M. = 129.5, i = 20, e = = -^=-.2 

Artthmetie Mean = G.M. + le = 129.5 + 20(-.2) = 125.5 


it has been taken at the fourth and middle class interval, whose mid-point 
value is 129.5. This, then, is G.M., the guessed mean. The differences are 
shown in the fourth column of the table and the results of dividing them by 
i, the size of the class interval, which is 20, are shown in the fifth column. 
These quotients are the x' values for each interval. In the last column, the 
frequencies of each class interval have been multiplied by the sc' values. The 
algebraic sum of the resulting products, —6, is then averaged by dividing by 
the number of cases. This average for the data in Table 7:5 is equal to —.2. 
The operation is symbolized as follows: 

N 

Most authors describe this average as the correction and symbolize it by c. 
Actually, of course, this is an average of the residual values of the distribu¬ 
tion, with G.M. and i taken out. The corrections are in reality the values of 














THE MEAN 


159 


G.M. and i. However, we shall follow the usual practice and denote the 
average of the residual values as c. Thus, 

, _ 2(/x') 

The procedures used in the short method may therefore be symbolized as 
follows: 

[7:3] 

M = G.M. + or G.M. + ic Arithmetic mean (M), 

A/ from a guessed mean, 

G.M. 

Substituting the values obtained in Table 7:5 in Formula 7:3, we have 

M = 129.5 + 20(-.2) 

= 129.5 - 4.0 
= 125.5 


This value of the mean checks (as it should) with the results obtained in 
Tables 7:3 and 7:4. 

In practice, the procedure used in Table 7:5 is simplified by omitting the 
third and fourth columns. This has been done in Table 7:6, which therefore 
illustrates the final table for the simplified procedure used in Method III. 
This procedure is further illustrated in Table 7:7 for another group of data 
which includes both negative and positive numbers. This distribution is 
somewhat skewed, i.e., not bilaterally symmetrical with respect to the mean, 
but it is uni-modal and the skewness is so slight that it docs not affect the 
usefulness of the mean as a measure of central tendency. 


Table 7:6. Arithmetic Mean—Final Table for Short Method 
(Data from Table 7:2) 


Class Intervals 

f 

x' 

fx' 

180 and above 

2 

3 

6 

160-179 

2 

2 

4 

140-159 

5 

1 

5 

120-139 


0 

0 

100-119 


-1 

-7 

80-99 


-2 

-2 

60-79 


-3 

-12 


N = 30 


S =-6 

i 

GJA. = 129.5; # = 

= 20i c = ^ 
30 

■ = -.2 


Arithmetic Mean = 

G.A(. + ic = 

129.5 + 20(- 

-.2) = 125.5 







160 


THE MEAN AND STANDARD DEVIATION 


Table 7:7. Arithmetic Mean—Short Method 
(Data: Bemreuter Personality Inventory Scores for 100 College Freshmen) 


Class Intervals 

f 

x' 

fx' 

100 to 

129 

2 

5 

10 

70 to 

99 

1 

4 

4 

40 to 

69 

8 

3 

24 

10 to 

39 

4 

2 

8 

-20 to 

9 

16 

1 

16 

-50 to 

-21 

24 

0 

0 

-80 to 

-51 

23 

-1 

-23 

-110 to 

-81 

14 

-2 

-28 

-140 to 

-111 

7 

-3 

-21 

-170 to 

-141 

1 

-4 

-4 



N = 100 


2 =-14 


G.M. = -35.5; / = 30, c = — = -.14 
Arithmetic Mean = - 35.5 + 30(-.14) = -39.7 


C. THE STANDARD DEVIATION 
Definition 

The standard deviation is the most universally used measure of variability. 
It is defined as the square root of the mean of the squared deviations of all 
measures in a distribution. This definition, as in the case of most statistical 
definitions, summarizes the operations involved in computing the measure. 
However, it should be noted that the standard deviation is an average meas¬ 
ure. It is the square root of the second-order moment. 

In order to describe a distribution, it should be obvious by now that a 
measure of variability is needed in addition to the mean. Two distributions of 
a variable may have similar means but differ markedly in the extent of 
scatter of their respective measures about the means. 

With respect to sampling theory and problems of analytical statistics, the 
standard deviation is widely employed as the standard measure of variability, 
or deviational tendency. It is generally the most reliable of all the measures 
of deviational tendency.* The standard deviation, when derived from dis¬ 
tributions of the normal probability type, is widely employed in psycho¬ 
logical measurement since it is especially relevant to the development and 
interpretation of Standard scores (see Chapter 8). In fact, the standard 
deviation has come to be a relative unit of measurement (differentiation) for 
most psychological scales of abilities. 

* The meaning of the statistical concept of reliability is developed in some detail in 
Chap. 17, Section B. At this point, it is sufficient to point out that in sampling theory any 
measure of a distribution derived from a random sample, or series of random samples, is 
the more reliable, the less it differs in value from the value of that measure for the popu¬ 
lation or universe as a whole. 



THE STANDARD DEVIATION 


161 


The three methods used for computing the standard deviation have their 
analogues from the computation of the mean. Thus, 

1. A long method, ungrouped data. 

II. A long method, grouped data. 

III. A short method, grouped data. 

Ilia. A short method, ungrouped data. 

Method III is by far the easiest and the least subject to (jornputational errors, 
as the following examples will show. 

Method I: Standard Deviation from Ungrouped Data 

The disadvantages of the first method for computing the standard deviation 
are similar to those of Method I for the mean; in addition, the computations 
are unnecessary, and even more laborious. The method is illustrated in 
Table 7:8. 


Table 7:8. Standard Deviation—Long Method for Ungrouped Data 
(Arithmetic Mean = 124.3; Data from Table 7:1) 


Subject 

Number 

Score 

(X) 

Deviation 

lx) 

Deviation 

Squared 

(x2) 

Subject 

Number 

Score 

(X) 

Deviation 

(x) 

Deviation 

Squared 

lx*) 

1 


21.7 

470.89 

16 

Ml 


388.09 

2 


55.7 

3,102.49 

17 



2,275.29 

3 


-64.3 

4,134.49 

18 


1.7 

2.89 

4 


-20.3 

412.09 

19 


5.7 

32.49 

5 

■H 

-16.3 

265.69 

20 

150 

25.7 

660.49 

6 

132 

7.7 

59.29 

21 

126 

1.7 

2.89 

7 

152 

27.7 

767.29 

22 

176 

51.7 

2,672.89 

8 

116 

-8.3 

68.89 

23 

112 

-12.3 

151.29 

9 

76 

-48.3 

2,332.89 

24 

120 

-4.3 

18.49 

10 

180 

55.7 

3,102.49 

25 

122 

-2.3 

5.29 

11 

72 

-52.3 

2,735.29 

26 

96 

-28.3 

800.89 

12 

72 

-52.3 

2,735.29 

27 

120 

-4.3 

18.49 

13 

126 

1.7 

2.89 

28 

132 

7.7 

59.29 

14 

152 

27.7 

767.29 

29 

108 

-16.3 

265.69 

15 

116 

-8.3 

68.89 

30 

104 

-20.3 

412.09 




21,026.15 




7,766.55 


S(x2) = 21,026.15 + 7,766.55 = 28,792.7 


Standard Deviation = 



^128,792.7 
>> 30 


= 30.98, or 31.0 


The mean must first be computed. The mean for the data in Table 7:1, 
which have been used in Table 7:8, was found to be 124.3. The deviations 
(x) are next obtained: 

a: = A - M, 
















162 


THE MEAN AND STANDARD DEVIATION 


each deviation being the difference between a measure (X) and the mean 
(Mx) of the distribution. The direction of the differences is indicated by the 
use of negative signs for negative deviations. The deviation for each of the 
30 cases is given in the x columns of Table 7:8. 

The third step consists in squaring each deviation; these computations are 
facilitated by a table of squares. * The square of each deviation is given in the 
columns of the table. It is usually sufficient in descriptive statistics to 
carry the squares to two decimal places for data originally obtained in integral 
values. Summing the values for each measure of the distribution completes 
the preliminary computations necessary for calculating the standard deviation 
by this method. As indicated at the bottom of the table, the average of the 
sum of the x^’s is obtained and the square root of this average is computed. 
For these data, the standard deviation is found to be 31.0. This measure is 
therefore the square root of the mean of the squared deviations (taken from 
the mean). In practice, the standard deviation is symbolized by a (Greek 
sigma). 

The preceding computations may be symbolized as follows: 

,_ [7:4] 


(T = 


?(?!) 

N 


Standard deviation 
(o’), for ungrouped 
data 


Variance 


The square of a, viz., is called the measure of a distribution’s variance, 
and is used in many problems of sampling and analytical statistics. The 
variance of a distribution is the mean of the squared deviations and is, as was 
earlier indicated, the second moment, /Z 2 , from the mean: 


( 7 * = S(x2)/A^ 


C7:4o] 

Variance 


Method II: Standard Deviation—Long Method with Grouped Data 

The data in Table 7:2 are used to illustrate Method II for computing <t. 
This method likewise involves unnecessarily laborious computations and 
consequently is rarely used. Method III, the short method, gives exactly the 
same result as Method II (except for the possible effect of dropped decimals 
on the result). 

The steps of this method, shown in Table 7:9, can be summarized as follows: 

1. The mean is first obtained from the frequency distribution of grouped 
data. (The mean for the frequency distribution of Table 7:9 was com¬ 
puted in Table 7:2 and was found to be 125.5.) 

2. The deviations, x (the differences between each mid-point and the mean), 
are next computed for each class interval. 

3. These deviations are then squared to give 


See Table I, Appendix C for squares of integers from I to 1000. 



THE STANDARD DEVIATION 


163 


Table 7:9. Standard Deviation—Long Method with Grouped Data 
(Arithmetic Mean = 125.5; Data from Table 7:2) 


Class Intervals 

f 

Mid-Pt. 

X 

x2 

f(x2) 

180 and above 

2 

189.5 

64.0 

4,096.0 

8,192.0 

160-179 

2 

169.5 

44.0 

1,936.0 

3,872.0 

140-159 

5 

149.5 

24.0 

576.0 

2,880.0 

120-139 

9 

129.5 

4.0 

16.0 

144.0 

100-119 

7 

109.5 

-16.0 

256.0 

1,792.0 

80- 99 

1 

89.5 

-36.0 

1,296.0 

1,296.0 

60- 79 

4 

69.5 

-56.0 

3,136.0 

12,544.0 


N = 30 




2 = 30,720.0 


X = Mid-Pt. - Mean = Mid-Pt. - 125.5 


^ N 


4 


3 0720.0 

30 


= Vl,024.0 = 32.0 


4. The ally’s are multiplied by /, the frequencies of their respective class inter¬ 
vals. (These values are given in the last column of Table 7:9.) 

D. The/(x2)’s are added to obtain the sum of all squared deviations: 

6. This sum is averaged; i.e., is divided by N to obtain the mean of 

the squared deviations. 

7. The standard deviation, a, is then obtained by extracting the square root 
of the mean of the squared deviations. 


The preceding steps in the computation of cr by Method TT may be sym¬ 
bolized as follows: 


a = 



[7:5] 

Standard deviation 
(o’), for variate data 
grouped into clas.s in¬ 
tervals 


For the data in Table 7:9, cr is equal to 32.0, which varies only slightly 
from the value obtained for the ungrouped data in Table 7:8. As previously 
indicated for the mean, such a discrepancy is to be expected in the results 
obtained with Method I, for which the data are ungrouped, and those obtained 
with Method II, for which the data are grouped into a frequency distribution. 


Method III: Standard Deviation—^Short Method with Grouped Data 

The simplest procedure for computing the standard deviation is Method III. 
The arithmetic is simple, especially when compared with that required for 
the first two methods. 

The procedure is illustrated in Table 7:10, the data and computations for 
the mean being taken from Table 7:6. In fact, only one additional column of 


















164 


THE MEAN AND STANDARD DEVIATION 


Table 7:10. Standard Deviation—Short Method with Grouped Data and 
Charlier's Check on Computations 
(Data from Table 7:6) 


Oast Intarvals 

f 

x' 

#(x0 

Ax'*) 

(Check) 
t(x' + l)* 

180 and above 

2 

3 

6 

18 

32 

160-179 

2 

2 

4 

8 

18 

140-159 

5 

1 

5 

5 

20 

120-139 

9 

0 

0 

0 

9 

100-119 

7 

-1 

-7 

7 

0 

80- 99 

1 

-2 

-2 

4 

1 

60- 79 

4 

-3 

-12 

36 

16 


N = 30 


S =-6 

2 : = 78 

S = 96 


C == i = 20 Vn “ 20V2.6 - .04 = 20^2^6 

= 20(1.6) == 32.0 


Check: 2f(x' + D* = S^(x'2) -f 22{fx') + N 
96 = 78 + 2(-6)-f-30 
96 = 96 


computations is required by this method. After the f{x') values are obtained 
for each class interval, these values are multiplied by the corresponding r' 
values to obtainAlgebraically, 

xVx') =/(>) 

and it is therefore not necessary to have a separate column of values to 
be multiplied by /. 

Summing the /(aj'^) column gives, for the total distribution, the products 
of the frequencies and the squared deviations (taken from the guessed mean, 
with the size of the intervals, i, excluded by division). This sum, 78, is given 
at the bottom of the next to the last column of the table. The summation may 
be symbolized as follows: 

W*) 

As in the case of the short method of computing the mean, it is now neces¬ 
sary to restore to tliis result what was eliminated by subtraction and divi¬ 
sion. The correction, c, used for the mean, is squared and subtracted from 

This value, c\ is always subtracted (regardless of whether the guessed 

mean is greater or less than the actual value of the mean), because the average 
of the squared deviations from a guessed mean not equal to the actual mean 
will always be too large, but never too small. This is the case because all 
deviations are squared and are therefore positive values. 




















THE STANDARD DEVIATION 


165 


After c* is subtracted from the average of the squared deviations, the 
square root of this corrected average is next obtained. This root value is then 
multiplied by i, the size of the class interval, to give the standard deviation 
for the distribution of original scores. For the data in Table 7:10, a equals 
32.0, exactly the same value as was obtained by Method II for this distribu¬ 
tion of grouped data in Table 7:9. 

The preceding operation for the computation of c by the short method may 
be symbolized as follows: 

W!) _ or Standard dev Jit'l 

N \ ^ / \ N short method from 

guessed mean 

Charlier's Check 



It is well always to have an independent method by which to check a 
series of arithmetic or algebraic computations. If no simple checking methods 
are available, the original operations may need to be repeated. However, 
Charlier's check is convenient in checking the basic operations required by 
the short method for computing the mean and standard deviation. 

The use of this check is illustrated in the last column and at the bottom 
of Table 7:10. First, 1 is added to the unit deviations shown in column a?'; 
the resulting sums for each class interval are then squared and multiplied 
by their respective frequencies. Thus, x' for the highest class interval in the 
table is 3. Adding 1 to 3 gives 4, and squaring this gives 16. The number of 
frequencies for this class interval is 2; hence the product for the check column 
is 2(16) = 32. 

As indicated at the bottom of the table, the sum of the check column, 
f(x' + 1)2, is equal to the sum of the/(a;'2) column plus twice the sum of the 
f(x') column pins N, the total number of frequencies. 

Charlicr’s check tests the accuracy of all the computations within the table 
which yield the husic sums needed for the final computation of M and cr, but 
it does not check the accuracy of the latter two values. 


Method Ilia: Standard Deviation—^Short Method with Ungrouped 

Data 

Method III can also be used for computing the standard deviation for a 
set of original ungrouped data of a variable whose measures are positive 
integral values (or can readily be treated as such). 

We have not called this procedure a fourth method because it is only a 
special case of Method III. It is widely employed in machine computations 
of the mean and standard deviations. Its disadvantage lies in the fact that, 
since a frequency distribution is unnecessary, the investigator is likely to 
obtain and interpret his results for a variable without concerning himself 
with the form of the distribution from which the mean and a are obtained. 



166 


THE MEAN AND STANDARD DEVIATION 


This neglect may lead to serious errors in interpreting both the mean and 
standard deviations in the case of distributions that depart radically from 
the normal, bell-shaped type. This is especially true for extremely skewed, 
uni-modal distributions as well as the U- and J-types. 


Table 7:11. Standard Deviation—Short Method with Ungrouped Data 
(Data from Table 7:1) 


Subject No. 



Subject No. 

X 

(x') 

X2 

(x'*) 

1 

146 

21,316 

16 

144 

20,736 

2 

180 

32,400 

17 

172 

29,584 

3 

60 

3,600 

18 

126 

15,876 

4 

104 

10,816 

19 

130 

16,9C0 

5 

108 

11,664 

20 

150 

22,500 

6 

132 

17,424 

21 

126 

15,876 

7 

152 

23,104 

22 

176 

30,976 

8 

116 

13,456 

23 

112 

12,544 

9 

76 

5,776 

24 

120 

14,400 

10 

180 

32,400 

25 

122 

14,884 

11 

72 

5,184 

26 

96 

9,216 

12 

72 

5,184 

27 

120 

14,400 

13 

126 

15,876 

28 

132 

17,424 

14 

152 

23,104 

29 

108 

11,664 

15 

116 

13,456 

30 

104 

10,816 



X = 234,760 



X = 257,796 


XX = 1792 + 1938 = 3730. Z(X*) = 234,760 + 257,796 = 492,556 
Xx^ XX 

G.M. = 0; i = 1; = "TJ" == = Arithmetic Mean (M) = 3730/30 = 124.3 

= ,• = ,.o 

= Vl6,418.53 - 15,450.49 =V968 04 == 31.1 


The computation procedure, illustrated in Table 7:11 with the data of 
Table 7:1, may be summarized as follows: 

1. Each original score (X) is taken as a deviation from a guessed mean equal 
to zero. Hence, 

X - GM. == X-0 = X, and X ^ x' 

2. The sum of the original scores (X) divided by the number of cases (N) 
gives the mean. This is equal to c, the correction: 


XX 

N 


N 


, since X 


x' 


Since the original scores are integral values, i, the size of the “class 
intervals” for such ungrouped data, is equal to 1.0. 















THE STANDARD DEVIATION 


167 


3. Each original score is squared to give the square of the deviations: 

4. The squared deviations are summed to give S(X2), which equals 2(ar'2). 

5. The correction squared (which in this case is the mean squared) is sub¬ 
tracted from the mean of the sum of the squared deviations. 

6. The square root of the result obtained in the preceding step is the desired 

value of <r. 

The value of <t obtained by the short method in Table 7:11 is similar to 
that obtained by the long method with ungrouped data in Table 7:8. The two 
should be identical, except for dropped decimals, inasmuch as the data in 
both cases were obtained from the same arrangement of measures, i.e., they 
were ungrouped. 

The preceding operations may be symbolized by Formula 7:6, since 
Method Ilia is simply a special case of Method III. However, they are often 
symbolized in terms of the original measures, as follows: 

_ [7:6o] 

^{X^) ^ / XX y Standard deviation, a, 

yV \ iV / special case of formula 

7:6 

where X equals the values of the original positive integral measures, and i, 
the size of the “intervals,” is therefore equal to 1.0. 

Sheppard’s Correction * for a 

When a variable has only a few broad classes, a mathematical error arises 
in computing the standard deviation for distributions of the normal bell¬ 
shaped type, because, as was indicated in Chapter 5, the mid-points of each 
class interval do not coincide with the actual means of the cases within the 
intervals. When many class intervals are used, the difference is negligible; 
however, when there are less than ten or twelve, it is sometimes worth while 
to correct for the constant error that arises. Such a correction was developed 
by Sheppard. It is easy to apply, and hence its use may well be considered 
when distributions are of the normal bell-shaped type but with few class 
intervals. 

We shall illustrate its use with the personal tempo data in Table 7:10, even 
though this distribution is not particularly bell-shaped. Only 7 class intervals 
were used for the distribution of these scores because of the few cases 
(only 30), rather than because of any limitations inherent in the measures 
themselves. Actually the range of scores was from 60 to 180, and many more 
class intervals could have been used if there had been enough frequencies to 
warrant smaller intervals. 

* W. F. Sheppard, “The Calculation of the Moments of a Frequency Distribution,” 
Biomeirika, 5:150-459, 1907. 




168 


THE MEAN AND STANDARD DEVIATION 


The correction itself is a constant, equal to 1/12, or .0833. This consteuit is 
suMructed from the average of the squared unit-deviations from the actual 
rather than the guessed mean, as follows: 

[7:7] 

_ Standard deviation, o', 

(T rnrrr r trd “ I — .0833 with Sheppard's cor¬ 

rection for broad 
classes 

For the data in Table 7:10: 

a = 20 V 2 I 6 = 20(1.6) = 32.0 
(Tror. = 20Vi56 - .0833 - 20(1.57) = 31.4 

Thus the corrected standard deviation for the distribution of these personal 
tempo scores is 31.4 instead of 32.0. Matliematically tfiis difference is notice¬ 
able, but psychologically it is not very important. That is to say, a difference 
of six-tenths of a unit on the metronome scale on which these personal tempo 
scores were based makes little or no difference in the psychological interpreta¬ 
tion of the result. Furthermore, in sampling statistics the error arising from 
broad classes is often small compared with errors of sampling and measure¬ 
ment. Sheppard’s correction is an unnecessary, mathematical over-refinement 
in such situations. 

It will be observed that this correction of .0833 is always subtracted from 
the average of the squared unit-deviations from the mean because the con¬ 
stant error arising from broad classes increases rather than decreases the size 
of the deviations: In a normal, bell-shaped distribution, there are more cases 
between the mid-point of a class interval and the limit of the interval that is 
nearer the mean of the distribution, than between the mid-point and the 
other limit of the interval. The correction is not necessary for the arithmetic 
mean because the errors for intervals above the mean will tend to cancel out 
the errors for intervals below the mean. 

D. THE AVERAGE DEVIATION 

The average deviation^ A,D,, sometimes referred to as the mean deviation, is 
another statistical measure that is used to summarize the deviational tendency 
of the measures of a variable. However, it is not used as commonly in statis¬ 
tical practice as is the standard deviation. For one thing, the average devia¬ 
tion does not exist from an algebraic point of view, since the algebraic sum 
of deviations from the mean is equal to zero. From the point of view of sam¬ 
pling, furthermore, the standard deviation is a more reliable measure of 
deviational tendency than is the average deviation. Nevertheless, because of 
its occasional use the computation of the average deviation will be briefly 
described. 

Definition 

The average deviation is the arithmetic mean of the differences between the 
measures of a distribution and the mean of that distribution. These differ- 



THE AVERAGE DEVIATION 


169 


ences (or deviations) are summed without regard to the direction of the 
differences from the mean of the distribution. If X symbolizes any measure 
of a distribution, and x symbolizes a deviation, the average deviation is equal 
to the following: 

A n Six) ^ [.7:8] 

~ yy ’ iV Average deviation, un- 

grouped data {A,D.) 

where S symbolizes the operalwn of summing the deviations obtained, the 
direction of the differences being neglected. 

Two methods for computing the average deviation will be described: 
Method I, the average deviation from ungrouped data; and Method II, the 
average deviation from grouped data. 


Method I: Average Deviation—Ungrouped Data 


Method I is illustrated in Table 7:12, in which the data from Table 7:1 
have been used. We saw in Table 7:1 that the mean for the 30 personal tempo 
scores was 124.3. As indicated in the third column of Table 7:12, the difference 
(x) between each score and the mean is obtained without regard to sign. 
The sum of these differences is 718.0. Therefore, 


. ^ S(x) 718.0 


23.9 


Table 7:12. Average Deviation—Ungrouped Data 
(Arithmetic Mean = 124.3; Data from Table 7:1) 


Subject No. 

X 

(Score) 

X* 

(X-M) 

Subject No. 

X 

(Score) 

1 

* 

1 


21.7 

16 


19.7 

2 


55.7 

17 


47.7 

3 


64.3 

18 


1.7 

4 


20.3 

19 


5.7 

5 

■■ 

16.3 

20 

150 

25.7 

6 

132 

7.7 

21 

126 

1.7 

7 

152 

27.7 

22 

176 

51.7 

8 

116 

8.3 

23 

112 

12.3 

9 

76 

48.3 

24 

120 

4.3 

10 

180 

55.7 

25 

122 

2.3 

11 

72 

52.3 

26 

96 

28.3 

12 

72 

52.3 

27 

120 

4.3 

13 

126 

1.7 

28 

132 

7.7 

14 

152 

27.7 

29 

108 

16.3 

15 

116 

8.3 

30 

104 

20.3 



iHUll 



S = 249.7 


S(x) 468.3 + 249.7 718.0 _ 

N 30 30 


* Ordinarily x is always summed with regard to sign. However, as indicated in the text, the 
sign is disregarded in obtaining the average deviation. The operation of summing is therefore 
symbolized by S rather than the Greek symbol 2. 

















170 


THE MEAN AND STANDARD DEVIATION 


Method II: Average Deviation — Grouped Data 

The data in Table 7:3 used to compute the arithmetic mean are utilized 
in computing the average deviation by Method II, shown in Table 7:13. 
Arithmetically, the procedure is the same as that followed in Table 7:12, 
except that the original scores are grouped into 7 class intervals and the mid¬ 
point value of each interval is taken as representative of the scores in each 
interval. For these grouped data, the arithmetic mean was found to be 125.5. 
This value is therefore subtracted from the mid-point value of each class 
interval. These differences (deviations) are then multiplied by the frequencies 
for each interval, giving the results shown in the last column of Table 7:13. 
The sum of these products, 744.0, divided by N, the number of cases in the 
distribution, gives a mean deviation equal to 24.8. 


Table 7:13. Average Deviation—Grouped Data 
(Arithmetic Mean = 125.5; Data from Table 7:2) 


Class Intervals 

f 

Mid-Pt. 

X 

(MId-Pt.) - M 

fix) 

180 and above 

2 

189.5 

64.0 

128.0 

160-179 

2 

169.5 

44.0 

88.0 

140-159 

5 

149.5 

24.0 

120.0 

120-139 

9 

129.5 

4.0 

36.0 

100-119 

7 

109.5 

16.0 

112.0 

80- 99 

1 

89.5 

36.0 

36.0 

60- 79 

4 

69.5 

56.0 

224.0 


N = 30 



S = 744.0 


Average Deviation = 


Sf(x) 

N 


744.0 

30 


= 24.8 


Formula 7:8 (the A.D,) thus becomes as follows for the data of a variate 
grouped into class intervals: 

[7:9] 

A [) = A.D. for varialo data 

N grouped into class in¬ 

tervals 

As in computing the arithmetic mean and the standard deviation from 
ungrouped and grouped data, there is a slight difference in the values of the 
mean deviations shown in Tables 7:12 and 7:13. As previously indicated, this 
is to be expected because the mid-points of all the class intervals do not ordi¬ 
narily give the exact average of the original scores within the intervals. 

It will be observed that the A.D, is always less in value than the standard 
deviation. For the data in Table 7:13, A.D. was 24.8, whereas in Table 7:9 
a for the same data was found to be 32.0. . 



171 


THE COEFHCIENT OF RELATIVE VARIATION 


E. THE COEFFICIENT OF RELATIVE VARIATION 

Karl Pearson developed a measure for comparing the relative variability of 
two or more variates whose means are dissimilar. The measure is known as 
the Coefficient of Relative Variation, symbolized by V, and is stated as 
follows: __ 

lOOff [7:10] 

^ ~ Pearson’s Coefficient 

of Relative Variation 


M 


where the numerator is the standard deviation of the distribution and the 
denominator is the arithmetic mean. V measures, in percentages, the ratio 
of the standard deviation of a distribution to its mean. Thus, if V equals 
20%, this signifies that the standard deviation is 20% as large as the mean 
of the distribution from which it is derived. 

If two variates have unequal standard deviations but their means are the 
same, the Coefficient of Relative Variation is unnecessary, because the 
deviations of both variates are relative to the same point of reference, i.e., 
the same mean. On the other hand, the standard deviation of two distribu¬ 
tions may be the same, but their means dissimilar. In such cases, the vari¬ 
ability of the distribution with the smaller mean is relatively greater than the 
variability of the other distribution. 

Pearson developed this coefficient primarily for physical measures which 
have a true zero point on the scale of measures, and it is quite satisfactory for 
such variates. In the case of psychological measurements, however, test 
variables do not have a true zero point and consequently the Coefficient of 
Relative Variation can be logically used only when two or more distributions 
for the same variate are compared. Thus, it is sound to compare the relative 
variation of a group of boys with that of a group of girls for measures derived 
from the same test. But it is not sound to compare the relative variation of 
the results made by a group on one test with the results made by the same 
or a different group on a different test. This is true because the means of two 
different psychological variates are not fixed, but are rather a function of the 
characteristics of the construction of the test, as well as of the ability of the 
individuals taking it. This difference can be illustrated by the following 
example: 

A group of 100 boys is given two different tests, A and B. The results are 
summarized as follows: 



N 

M 

<r 

V 

Test A 

100 

50 

10 

100(10)/50 = 20.0% 

Test B 

100 

60 

10 

100(l0)/60 = 16.7% 


If these results could be taken at their face value, it could be concluded 
that the group was relatively more variable on Test A (V = 20.0%) than 
on Test B (V = 16.7%). These results, however, cannot be interpreted lit- 



172 


THE MEAN AND STANDARD DEVIATION 


erally, because the variates are difTerent. If, for example, there had been 
20 additional easy items on Test A which all the boys in the group would have 
answered correctly, the results would have been as follows: 



N 

M 

<T 

V 

Test A 

100 

70 

10 

100(10)770 = 14.3% 

TestB 

100 

60 

10 

100(10)760 = 16.7% 


Va is now seen to be 14.3% instead of 20%. But actually the variability of 
the group has not changed at all. Twenty points have been added to each 
individual score because of the inclusion of the 20 easy items, and the mean 
is consequently 70 instead of 50. This increase in the mean therefore reduces 
the size of V as indicated above. 

The Coefficient of Relative Variation is thus seen to be a capricious index 
of relative variability for psychological variables unless two or more groups 
are compared with respect to the same variate. In such comparisons, the scale 
of measures is the same and consequently the result is relatively unaffected 
by the particular character of the test itself. Consider, for example, the 
results of Test A given to a group of 100 boys and 100 girls: 

N M <r V 

Boys 100 50 10 100(10)/50 = 20.0% 

Girls 100 40 8 100(8)/40 = 20.0% 

Thus the “absolute” variability (in terms of a) of the group of boys is 25% 
greater than that of the girls, since 100(10 — 8)/8 = 25%. The relative 
variability of the two groups, however, is the same, 20% in both cases. 

EXERCISES 

1. Under what circumstances are the mean, median, and mode of a distribution 
always equal in value? 

2. What is the essential difference between the mean and the median of a distribution? 

3. Under what circumstances is the mean a measure of central tendency of a dis¬ 
tribution? 

4. What is the essential difference between the standard deviation and the quartile 
deviation? 

5. What is the essential difference between the standard deviation and the average 
deviation? 

6. Under what circumstances can and cannot Pearson’s coefficient, F, be used to 
compare the relative variations of two or more distributions? 

7. Using the data in Table 5:14: 

a. Compute the means and standard deviation of the ages of the freshmen and 
their best friends by Method I for the first 25 subjects (long method, data 
not grouped in the frequency distribution). 

b. Compute the means and standard deviations of the grade averages of the 
freshmen and their best friends by means of Method II (long method, data 
grouped). 



THE COEFFICIENT OF RELATIVE VARIATION 


173 


c. Compute the means and standard deviations of the intelligence leal scores 
of the freshmen and their best friends by Method III (short method with 
data grouped). 

8. Compare and interpret the results for college freshmen and their best friends 
obtained in the preceding exercise. 

9. Compute the mean and standard deviation for both variables in Table 6:7. 

10. In the preceding exercise, are the means in both cases as adequate as the medians 
for the purpose of summarizing the “central tendency” of each distribution? 

11. Apply Sheppard’s correction to the standard deviations for the age variable for 
the first 25 cases of both college freshmen and their best friends (Table 5:14), and 
compare these corrected values with those obtained in Exercise 7a. 

12. Compare the relative variation of (a) the grade scores, (b) the intelligence test 
scores, and (c) the ages of the college freshmen and their best friends (Table 5:14). 

13. For the data of Exercise 12, can the relative variability of tiie college freshmen’s 
average grades be compared with that of their intelligence test scores? Why? 



CHAPTER 8 


Comparative Implications of the Normal^ 
Bell-Shaped Curve 

A. IMPLICATIONS OF M, AND <r FOR NORMAL, BELL¬ 
SHAPED DISTRIBUTIONS 

The mean and standard deviation have important theoretical as well as 
practical implications when they are derived from a distribution that tends 
to be of the normal, bell-shaped type. Although knowledge of their mathe- 


Fig. 8:1. The Normal, Bell-Shaped Distribution 



Scale of Measures 

matical properties, under such circumstances, is indispensable for the prob¬ 
lems of sampling and analytical statistics, awareness of some of the basic 
implications of M and a is also relevant so that the meaning of these measures 
may be broadened for purely descriptive problems. We shall therefore describe 
some of the properties of the normal, bell-shaped distribution, the general 
form of which is shown in Fig. 8:1. By describing the normal distribution with 
respect to the first and second moments, we can at the same time ascertain 
some of the “normal” implications of the mean and standard deviation. 

174 











175 


IMPLICATIONS OF M, AND a FOR NORMAL DISTRIBUTIONS 

The Mean as Point of Reference 

The mean is taken as the fundamental point of reference. All deviations 
are computed from it, and the algebraic sum of these deviations equals zero. 

The Mean as a Fulcrum 

The mean is at a point on the scale that cuts in half both the total weights 
of the measures and the total number of frequencies. Whereas the median 
divides a distribution of frequencies into two equal halves, regardless of the 
values or weights of the measures, the mean does this and more. It is analo¬ 
gous to a fulcrum, for if a normal distribution were balanced on a knife-edge 
at a point on the abscissa scale corresponding to the value of the mean, the 
distribution would be in perfect equilibrium. 

The Median and Mean 

The value of the median coincides with the value of the mean when a dis¬ 
tribution is bilaterally symmetrical, as is the case for the normal distribution. 

Uni-Modality and the Mode 

The normal distribution has one modal point; in other words, it is a uni- 
modal distribution with the greatest number of frequencies at the mean. 
Hence, the value of the mode * coincides with the value of the mean and 
median. 

Bilateral Symmetry 

The normal distribution is bilaterally symmetrical with respect to the 
mean. Not only do the sums of all the positive and negative deviations equal 
eacjh other, and therefore summate to zero, but the algebraic sum of any 
part of the deviations is equal to zero, regardless of the deviate distances 
above and below the mean, so long as the two distances are taken equally. 
In other words, the frequencies above and below the mean decrease at a 
uniform rate with each successive interval (of whatever size) above and 
below the mean. The slope of the curve is always the same at equal distances 
above and below the mean. 

Points of Inflection and a 

The standard deviation is the standard measure of variability for the normal 
distribution. As the second moment with respect to the mean, a measures a 

* The mode is defined as the value of the most frequently occurring measure in a distri¬ 
bution, or, better, as the mid-point value of the class interval with the greatest number of 
frequencies. It is sometimes used as a third type of measure of central tendency for dis¬ 
tributions of the normal, bell-shape<l type. However, it is also useful in describing the 
modal intervals of I I-shaped (t,wo modes) and J-shaped (one major mode and one minor 
mode) types of distributions. 



176 COMPARATIVE IMPUCATIONS OF THE NORAAAL, BELL-SHAPED CURVE 


range above and below the mean that is exactly equal to that range taken 
with respect to the points of inflection * on each side of the peak of the curve. 
That is, if lines perpendicular to the abscissa are dropped from the point of 
inflection on each side of the curve, they coincide with perpendicular lines 
drawn to the curve from the abscissa at points exactly one standard deviation 
above and below the mean. 

Thus, the range of the standard deviation above and below the mean marks 
the points on the curve at which the rate of decrease in frequencies changes. 
Between the mean and one standard deviation distance (M ± l<r), the rate 
of decrease accelerates, whereas beyond M ± la this rate decelerates. In 
other words, the slope of the curve is convex between M ± la. 

Asymptotic Character of the Normal Curve 

When the normal curve is considered as representing a distribution of fre¬ 
quencies, the number of frequencies is necessarily infinite, because otherwise 
the surface of the curve would not be perfectly smooth and continuous. Now, 
as the distance of deviations from the mean is increased, the proportion of 
frequencies decreases. However, no matter how great a distance from the 
mean is taken, the frequencies never equal zero. In other words, the tails of 
the normal curve never reach the base line (abscissa) but are asymptotic 
with respect to the oj-axis. 

The Practical Limits Equal M ± S.Oa 

For practical purposes, on the other hand, the limits of the frequen(’ies of 
empirical distributions that are of the normal type rarely exceed a distance 
greater than 3.0 standard deviation units from the mean. As indicated in 
Table 8:1, shortly to be discussed, 49.865% of the area (or frequencies) of 
the normal distribution lies between the mean and S.Ocr. Therefore, 2(49.865) 
or 99.73% of the frequencies lie within the limits of M ± 3.0o’. 

The proportion of the area between the mean and 5.0(7 is seen in Table 8:1 
to be 49.99997133. The proportion of frequencies for an infinite distance 
beyond 5.0<7 is equal to 50.0 — 49.99997133, which is only 0.00002867%. 
When both tails of the distribution are considered, only 2(0.00002867)% of 
the frequencies lie beyond the limits of M ± 5.0(7. This amounts to less than 
6 hundred-thousandths of 1% (0.00006 of 1%). 

a as the Standard Measure of Variability 

A deviation, as we have seen, is symbolized by x, x being equal to X — M*. 
Inasmuch as deviations for the normal distributions are measured in terms 

of (7, any such deviate distance is symbolized as -, or ^ - —• Hence by 

a (7x 

* The point at which a concave downward portion of a curve meets a concave upward 
portion. Cf. Fig. 8:1. 



IMPLICATIONS OF M, AND a FOR NORMAL DISTRIBUTIONS 


177 


this ratio it is possible to denote any measure of a distribution in terms of a 
standard unit of differentiation or variability. Thus, any measure which is at 
a point on the scale one standard deviation above the mean has a value in 
standard deviation units of 1.0. For if X = Mx+ l<r*, 

X — Mr (Mr -f l<Tx) — Mr <Tr 

then --- = — = 1.0 

O’* <Tr (Tx 


Measures as z Scores 

Deviations in units of the standard deviation are symbolized by z, and are 
called z scores. Thus, 

X-Mr X - M rs : n 

O’* (7 z score 

When the subscripts are omitted, it is understood that the score (X), the 
mean (M), and the standard deviation (a) are all derived from the same 
distribution of a variable. We said earlier that particular measures of a dis¬ 
tribution are symbolized by numerical subscripts; similarly, z scores for par¬ 
ticular measures are symbolized by the same subscripts. Thus, the value of a 
score, Xu converted to its deviate distance in a units from the mean, is 
symbolized by Zxi and is equal to 

Xi - Mr 

^—;;;— 

<r* 

Measures below the mean have negative z score values and are always so 
labeled. Measures above the mean are positive and their z score values are 
usually written without the plus sign. 

z Scores Signify Relative Position in a Series * 

The concept of the positional meaning of a measure is of extreme significance 
to the concept of measurement itself in psychology and related fields. This is 
the case because (1) original scores or measures usually have little or no 
meaning until considered in relation to the distributions from which they 
are derived, and (2) measures of ability, attitudes, interests, etc., are not 
additive as are units of the e.g.s. system in physical measurements; at best, 
psychological measures denote the position of an individual in a distribution. 
The functional implications of a given position in a scale are a problem for 
empirical determination. 

z scores, or their derivatives, are universally used to express the relative 
position of an original measure in the series of measures or distribution from 
which it is derived. Consequently, z scores and their derivatives are valuable 
for comparing the relative position of measures of different variables. 

* Cf. J. G. Peatman, “ On the Meaning of a Test Score in Psychological Measurement,” 
American Journal of OrthopsychicUry, 9:23-47, 1939; especially pp. 29 ff. 



178 COMPARATIVE IMPUCATIONS OF THE NORAAAL, BELL-SHAPED CURVE 

If a person receives a score of 90 on Test x and a score of 55 on Test y, 
these values of 90 and 55 are not directly comparable, because they are 
obtained from two different variables whose units of measurement are not 
the same. However, they can be made comparable in respect to their relative 
position in each distribution if they are converted to z score values and 
thereby expressed in terms of their standard deviate distance from their 
respective means. If the mean of the x variable equals 70 and its standard 
deviation is 20, then 


And if the mean of the y variable equals 50 and its standard deviation is 5.0, 

55 — 50 

Zyj is also equal to 1.0, since ^-= 1.0. In both cases, therefore, the 

measures are in scale positions that are one standard deviation above their 
respective means, despite the fact that the original value of one measure is 
90 and that of the other is 55. 

In other words, if measures of two or more different distributions are located 
at the same abscissa points on normal distributions, they all have the same 
z score values, regardless of the magnitude of their original values. 


Centile Implications of Standard Measures 

The relative scale position of measures can be expressed, as just indicated, 
in terms of their standard deviate distance above or below the mean. Their 
position can also be interpreted in terms of centile values. This is done by 
differentiating the normal distribution into successive deviate distances from 
the mean and determining the proportion or percentage of frequencies found 
in the interval between the mean and any given distance. The unit of differen¬ 
tiation used for this purpose is again the standard deviation, and the deviate 

distances are therefore - values, i.e., z scores. 

<r 

Table 8:1 presents the differentiation of the total area (or frequencies) of a 
normal distribution into fractional parts for deviate distances above or 
below the mean, ranging from a distance of z = zero (the mean) to a dis¬ 
tance of z = 5.0. The proportion of the area (or frequencies) for any deviate 
distance from the mean is given in percentages in the body of the table. 

If an original score has a z value of 1.3, it is above the mean (since it is 
positive); and it is at a point on the scale of measures such that 40.32% of 
the area (or frequencies) lies between this point and the mean. This is illus¬ 
trated in Fig. 8:2. 

If 40.32% of the frequencies are between the mean and a score that has a 
z value of 1.3, then 90.32% of the frequencies of the distribution will be below 
this point, and 9.68% will be above it. In other words, an original score whose 



IMPLICATIONS OF M, AND a FOR NORAAAL DISTRIBUTIONS 


179 


Table 8:1. Fractional Parts of the Total Area Under the Normal Probability Curve, 
Corresponding to Distances on the Base Line Between the Mean and Successive 
Points Laid Off from the Mean in Units of Standard Deviation 

Example: Between the mean and a point 1.3(7, i.e., = 1.3 j, lies 40.32% of the entire area 

under the curve, or 40.32% of the frequencies. ^ ' 


X 

a 

.00 

.01 

.02 

.03 

.04 

.05 

.06 

.07 

.08 

.09 

0.0 

00.00 

00.40 

00.80 

01.20 

01.60 

01.99 

02.39 

02.79 

03.19 

03.59 

0.1 

03.98 

04.38 

04.78 

05.17 

05.57 

05.96 

06.36 

06.75 

07.14 

07.53 

0.2 

07.93 

08.32 

08.71 

09.10 

09.48 

09.87 

10.26 

10.64 

11.03 

11.41 

0.3 

11.79 

12.17 

12.55 

12.93 

13.31 

13.68 

14.06 

14.43 

14.80 

15.17 

0.4 

15.54 

15.91 

16.28 

16.64 

17.00 

17.36 

17.72 

18.08 

18.44 

18.79 

0.5 

19.15 

19.50 

19.85 

20.19 

20.54 

20.88 

21.23 

21.57 

21.90 

22.24 

0.6 

22.57 

22.91 

23.24 

23.57 

23.89 

24.22 

24.54 

24.86 

25.17 

25.49 

0.7 

25.80 

26.11 

26.42 

26.73 

27.04 

27.34 

27.64 

27.94 

28.23 

28.52 

0.8 

28.81 

29.10 

29.39 

29.67 

29.95 

30.23 

30.51 

30.78 

31.06 

31.33 

0.9 

31.59 

31.86 

32.12 

32.38 

32.64 

32.89 

33.15 

33.40 

33.65 

33.89 

1.0 

34.13 

34.38 

34.61 

34.85 

35.08 

35.31 

35.54 

35.77 

35.99 

36.21 

III 

36.43 

36.65 

36.86 

37.08 

37.29 

37.49 

37.70 

37.90 

38.10 

38.30 


38.49 

38.69 

38.88 

39.07 

39.25 

39.44 

39.62 

39.80 

39.97 

40.15 

to 

40.32 

40.49 

40.66 

40.82 

40.99 

41.15 

41.31 

41.47 

41.62 

41.77 

m 

41.92 

42.07 

42.22 

42.36 

42.51 

42.65 

42.79 

42.92 

43.06 

43.19 

1.5 

43.32 

43.45 

43.57 

43.70 

43.82 

43.94 

44.06 

44.18 

44.29 

44.41 

1.6 

44.52 

44.63 

44.74 

44.84 

44.95 

45.05 

45.15 

45.25 

45.35 

45.45 

1.7 

45.54 

45.64 

45.73 

45.82 

45.91 

45.99 

46.08 

46.16 

46.25 

46.33 

1.8 

46.41 

46.49 

46.56 

46.64 

46.71 

46.78 

46.86 

46.93 

46.99 

47.06 

1.9 

47.13 

47.19 

47.26 

47.32 

47.38 

47.44 

47.50 

47.56 

47.61 

47.67 

2.0 

47.72 

47.78 

47.83 

47.88 

47.93 

47.98 

48.03 

48.08 

48.12 

48.17 

2.1 

48.21 

48.26 

48.30 

48.34 

48.38 

48.42 

48.46 

48.50 

48.54 

48.57 

2.2 

48.61 

48.64 

48.68 

48.71 

48.75 

48.78 

48.81 

48.84 

48.87 

48.90 

2.3 

48.93 

48.96 

48.98 

49.01 

49.04 

49.06 

49.09 

49.11 

49.13 

49.16 

2.4 

49.18 

49.20 

49.22 

49.25 

49.27 

49.29 

49.31 

49.32 

49.34 

49.36 

2.5 

49.38 

49.40 

49.41 

49.43 

49.45 

49.46 

49.48 

49.49 

49.51 

49.52 

2.6 

49.53 

49.55 

49.56 

49.57 

49.59 

49.60 

49.61 

49.62 

49.63 

49.64 

2.7 

49.65 

49.66 

49.67 

49.68 

49.69 

49.70 

49.71 

49.72 

49.73 

49.74 

2.8 

49.74 

49.75 

49.76 

49.77 

49.77 

49.78 

49.79 

49.79 

49.80 

49.81 

2.9 

49.81 

49.82 

49.82 

49.83 

49.84 

49.84 

49.85 

49.85 

49.86 

49.86 

3.0 

49.865 










3.5 

49.97674 









4.0 

49.99683 









4.5 

49.99966 









5.0 

49.99997133 










deviate distance in a distribution is 1.3 standard deviation units above the 
mean is at a point that cuts the distribution into two parts such that the 
lower part includes slightly more than 90% of the frequencies, and the upper 
part includes the remainder. 










180 COMPARATIVE IMPLICATIONS OF THE NORMAL, BELL-SHAPED CURVE 


Fig. 8:2. The Location on a Normal, Bell-Shaped Distribution of a Measure 1.3 
Standard Deviation Units above the Mean 



Fig. 8:3. Centile Implications of Standard Deviation Distances and z Score Units 
of the Normal, Bell-Shaped Distribution 



.5 


.5 


.5 


IMPLICATIONS OF M, AND a FOR NORMAL DISTRIBUTIONS 


181 


We saw in Chapter 6 that a measure that lies in this position on the centile 
scale is in the 91st centile interval. Therefore, an original measure with a 
z score value of 1.3 lies in the 91st centile interval of the normal distribu¬ 
tion. 

The centile interval locations of z score values for a normal distribution 
are summarized in Table 8:2, and the relationship between centile point 
values and z score differentiations for such a distribution is illustrated in 
Fig. 8:3. Thus, as indicated in the table, all z scores equal to or greater than 
2.33 are in the interval of Cioo; all z scores of -2.33 or less are in the first centile 
interval, Ci. The mean (2 = 0.0) is in the 51st centile interval (as is the 
median), in accordance with the principle that measures whose value corre- 

Table 8:2. The Centile Intervals of Original Measures Converted 
to z Score Values 

(Assuming Normal Variability) 











182 COMPARATIVE IMPUCATIONS OF THE NORAAAl, BELL-SHAPED CURVE 


spends with the lower limit of an interval lie within that interval. Actually, 
of course, the point value of the mean lies exactly at the mid-point of the 
scale. 

Commonly used points of reference for z score values between —3.0 and 
3.0 are noted on the normal distribution shown in Fig. 8:3. Thus, measures 
that are one standard deviation above the mean are in the 85th centile interval, 
because the point centile value at z = 1.0 is 84.1. It follows that 84% of the 
frequencies of a normal distribution are below z = 1.0. Therefore, a measure 
whose position on the scale is one standard deviation above the mean exceeds 
in value 84% of the measures of the distribution. 

A measure that is one standard deviation below the mean lies in the 16th cen¬ 
tile interval because the centile point value of z = —1.0 is 15.9. Therefore, 
nearly 16% of the measures of a normal distribution are lower in value than 
one whose z score equivalent is —1.0. The range of M ± l.Ocr includes approxi¬ 
mately the middle 68% of the frequencies. Although this follows from the 
preceding discussion, it can perhaps be computed more readily from the data 
in Table 8:1 than observed from Fig. 8:3. The percentage of the frequencies 
between M and l.Oo- is 34.13 in Table 8:1; twice this amount therefore gives 
the percentage of frequencies lying between M ± 1 .Oo*. 

At 2.0(r, the centile point value is given in Fig. 8:3 as 97.7. A z score of 2.0 
therefore lies in the 98th centile interval, and about 2% of all the measures 
of a normal distribution are greater than one whose value is 2.0 standard 
deviation units above the mean. At — 2.0(r, on the other hand, the centile 
point value is 2.3, and therefore a z score of —2.0 lies in the 3rd centile inter¬ 
val. Approximately 98% of all the measures of the distribution are greater 
than one whose value is —2.0 standard deviation units below the mean. The 
range of M ± 2.0(r includes approximately the middle 95% of the measures 
of a normal distribution. (M + 2.0<7, according to Table 8:1, equals 47.72% 
and twice this figure for M ± 2.0(r is 95.44%.) 

Measures whose z values are 0.5 and —0.5 lie in the 70th and 31st centile 
intervals respectively. About 30% of the measures of a normal distribution are 
thus greater in value than a measure one-half a standard deviation above the 
mean, and about 30% are less in value than a measure one-half a standard 
deviation below the mean. The range of M ± 0.5<7 thus includes approxi¬ 
mately the middle 40% of the frequencies. (M + 0.5(7, according to Table 8:1, 
equals 19.15%; twice this figure for M ± 0.5(7 is 38.30%.) 

Measures whose z values are 1.5 and —1.5 lie in the 94th and 7th centile 
intervals respectively. About 93% of the measures of a normal distribution 
are thus less in value than a measure 1| standard deviations above the mean, 
and about 93% are greater in value than a measure the same distance below 
the mean. The range of M ± 1.5(7 includes approximately the middle 85% 
of the frequencies. (M -f- 1.5(7, according to Table 8:1, equals 43.32%; twice 
this figure for M ± 1.5(7 is 86.64%.) 



IMPUCATIONS OF M, AND a FOR NORAAAL DISTRIBUTIONS 183 

Summary of Commonly Used Measures of Dispersion About 

the Mean 

The percentages of the total frequencies of a normal distribution that are 
included within the limits of various z score values are summarized in 
Table 8:3. These percentages, which are taken from Table 8:1, are commonly 
used reference values for the normal distribution. The student will be wise to 
memorize Table 8:3 because a ready knowledge of these dispersions facilitates 
the interpretation of the mean and standard deviation of distributions that 
are of the normal type. 

Table 8:3. The Dispersion of Frequencies About the Mean of a Normal 

Distribution 


Range in Terms of 

M and (7 

Per Cent of Frequencies 
Induded Within the Range 
(Rounded to Nearest 

Unit Value) 

A4 + or — 0.50" 

19% 

M 4" and — 0.5(7 

38% 

+ or - 0.6745(7 

25% 

4“ and — 0.6745(7 

50% 

-f- or — 1.0(7 

34% 

A1 4" and — 1.0(7 

68% 

A4 + or — 1.5(7 

43% 

4" and — 1.5(7 

87% 

AH- or - 2.0(7 

48% 

A1 4- and — 2.0(7 

95% 

Al-h or - 2.5(7 

49% 

A4 4" and — 2.5(7 

99% 

A1 “h or — 3.0(7 

50-% 

A4 4“ and — 3.0(7 

100-% 


The range of M ± 0.6745(r has been included in this table because it defines 
the range of the middle 50% of the frequencies of a normal distribution and 
is the basis for the probable error {P-E.) in sampling distributions (cf. 
Chapter 13, Section E). The P.E, of a measure whose sampling distribution is 
of the normal bell-shaped type is always 0.6745(7. 

The Normal Probability Curve 

The curve of normal probability is the normal, bell-shaped distribution 
shown in Fig. 8:1. It is often referred to as the normal curve of error, because 
random errors of measurement commonly yield distributions of this type. 









184 COMPARATIVE IMPLICATIONS OP THE NORMAL, BELL-SHAPED CURVE 


At this point it is sufficient to note that the mean and standard deviation, 
as the first and second moments, are the standard measures for probability 
and error distributions of the normal type. Extensive use will be made later 
of the properties and implications of normal probability in the development 
of Tests of Significance in the problems of analytical and sampling statistics. 

The Formula for the Normal Curve 

The equation for the normal distribution, differentiated in terms of standard 
deviation units, is as follows: 

N [8 i 2] 

y ~~ /— ^ Normal probability 

function in terms of o' 

This is the normal probability function. In plotting a normal distribution 
for N frequencies, it is not necessary to work directly from this function; 
rather, one can utilize tables of ordinate values (y) for different values of x 
which have been developed for distributions whose total area is taken as 
unity (cf. Table 1, Appendix B). 

Relationship Between Various Measures of Variability in a 
Normal Distribution 

We indicated earlier that c is greater than A,D. for any distribution. 
Both these measures of variability are larger than the quartile deviation and 
the tercile deviation, but smaller than the D range. When a distribution can 
be assumed to be normal, the relation between these various measures is as 
follows: 

<T = 2,317T.D. = 1.483(? = 1.253A.D. = .390D 

A.D. = 1.849T,D. = 1.18.3() = .798(7 = .311D 

Q = 1.563T.D. = .845A.D. = .6745<7 = .263D 

B. THE USE OF z SCORES AND STANDARD SCORES 

FOR COMPARATIVE PURPOSES 

We have seen that the standard deviation has come to be used as the 
standard measure of variability of the normal, bell-shaped type of distribu¬ 
tion. Original scores of a variable can readily be expressed in units of <r by 
conversion to z scores, where 

A-M, 

2jf — ' 

(Tx 

And, as indicated in Fig. 8:3, the positional implications of z scores can be 
interpreted in terms of centiles. However, z scores are somewhat inconvenient 
to use, particularly with machine methods, because of the presence of nega¬ 
tive numbers. All original scores bebw the mean will have negative z score 
values. In order to obviate the use of negative numbers, a variety of con- 



USE OF z AND STANDARD SCORES FOR COMPARATIVE PURPOSES 185 


version scales have been developed, the most satisfactory being tlie Standard 
score scale.* 

Standard Scores (S) 

Original measures or scores of a variable are readily converted into Standard 
scores by adding 5.0 to each of the z score values of the originals. Thus, 


or 


S 


— 5.0 4- Zx 



[8:3] 

Standard score 


where X is the original score and M and <t are the mean and standard devia¬ 
tion of the variable or distribution of which the original score is a member. 

The relation of Standard scores to z scores and centile ranks is shown in 
Fig. 8:4. It is to be emphasized that the normal, bell-shaped distribution is 
the basis for interpreting a Standard score scale as a yardstick for the differen¬ 
tiation of the measures of a variable. Under such circumstances Standard 
scores are a convenient and appropriate device for (1) the development of 
tables of norms, as in Table 8:4, and (2) the comparison of one individual’s 
scores on different tests or variables, as in the profile charts shown in Figs. 8:6 
and 8:7. 


Fig. 8:4. Standard Score Scale with z Score and Centile Point Equivalents 



z -3.0 -2.0 -1.0 0 1.0 2.0 3.0 

C 0.1 2.3 15.9 50.0 84.1 97.7 99.9 


* Although z scores themselves are sometimes referred to as Standard scores, we shall 
limit the use of the latter term to the conversion scale in which the mean is taken as equal 
to 5.0 and, as for z scores, the standard deviation remains 1.0. Cf. W. V. Bingham, Aptitudes 
and Aptitude Testing, Harper, New York, 1937, chap. 19. 









186 COMPARATIVE IMPLICATIONS OF THE NORMAL, BELL-SHAPED CURVE 


For practically all distributions of test scores, a Standard score range of 
from 2.0 to 8.0 is adequate. In fact, for many distributions, like that in 
Table 8:4, the actual range of scores is likely to be less than this range. 

Standard scores are usually written to one decimal place. This means that 
for a table of norms for a psychological test, there can be as many as 60 differ¬ 
entiations within the limits of 2.0 and 8.0. A scale with 60 intervals is more 
than adequate for any test; in fact, most tests are not sufficiently reliable to 
warrant the use of so many intervals. In most cases, 10 or 20 intervals are 
adequate. 


Standard Score Norms 

The development of test norms in terms of Standard scores and their 
centile equivalents is well illustrated by Table 8:4 for the Bennett Test of 

Table 8:4. Standard Score Norms, for the Bennett Mechanical Comprehension 
Test, Form AA, Candidates for Policeman and Fireman Positions 

Method I 


Original Score 

Standard Score 

Centile Interval ' 

56 

7.0 

98 

51 

6.5 

94 

46 

6.0 

85 

41 

5.5 

70 

36 

5.0 

51 

31 

4.5 

31 

26 

4.0 

16 

20 

3.5 

7 

15 

3.0 

3 

10 

2.5 

1 


Method II 


Original Scores 

Standard Score 
Interval 

Centile Limits 
of Interval 

56 and above 

7.0 and above 

97.7 to 100 

51 to 55 

6.5 to 6.9 

93.3 to 97.6 

46 to 50 

6.0 to 6.4 

84.1 to 93.2 

41 to 45 

5.5 to 5.9 

69.2 to 84.0 

36 to 40 

5.0 to 5.4 

50.0 to 69.1 

31 to 35 

4.5 to 4.9 

30.8 to 49.9 

26 to 30 

4.0 to 4.4 

15.9 to 30.7 

20 to 25 

3.5 to 3.9 

6.7 to 15.8 

15 to 19 

3.0 to 3.4 

2.3 to 6.6 

lOto 14 

2.5 to 2.9 

0.6 to 2.2 

9 and below 

less thon 2.5 

0 to 0.5 














USE OF z AND STANDARD SCORES FOR COMPARATIVE PURPOSES 187 


Mechanical Comprehension. The particular norms in this table were de¬ 
veloped for use with fireman and policeman candidates, and the distribution 
of scores from which the 

Fig. 8:5. Bennett’s Distribution of Mechanical Com¬ 
prehension Test Scores for 1838 Policemen and 
Firemen* 


norms were obtained is 
shown in Fig. 8:5. 

The standardizing 
group, i.e., the group to 
whom the test was ad¬ 
ministered for the pur¬ 
pose of developing 
norms, consisted of 1838 
policemen and firemen. 

Their mean score on 
the test was 35.6 and 
the standard deviation 
of the distribution was 
10.1. On the assumption 
that the distribution in 
Fig. 8:5 is sufficiently 
close to the normal, bell¬ 
shaped type, the Stand¬ 
ard scale norms in Table 8:4 were developed as follows: 



*The original data, which were used also in developing 
Table 8:4, were furnished by The Psychological Corpora¬ 
tion, New York, through the courtesy of Dr. George K. 
Bennett. 


5 = 5.0 -I- 2 = 5.0 -f 


X - M, 


= 5.0 + 


X - 35.6 


<r, 10.1 

To find X, an original score value for any value of S: 

Y = M* — 5.0o'x 4- Sag 


= 35.6 - 5.0(10.1) + 5(10.1) 
= 10.15 - 14.9 


[8:4] 

To find the original 
score value of any 
Standard score of a 
distribution 


Thus, with this formula for X and the particular mean and <t values of the 
standardizing group of 1838 scores, any original score value for a given 
value of S can be computed, as follows: 

For 5 = 5.0: X = 10.1(5.0) - 14.9 = 35.6 (the mean) 

For 5 = 3.0: X = 10.1(3.0) - 14.9 = 15.4 (or 15) 

For 5 = 6.5: ^ = 10,1(6.5) - 14.9 = 50.75 (or 51) 

Other values of X for given value of S are computed in the same way. 

Two methods for presenting the Standard score norms and their centile 
equivalents are shown in Table 8:4 for the Bennett Mechanical Comprehen¬ 
sion Test, Form AA. Method I is perhaps used more generally than Method II. 
The latter is simply a clarification of what the norms derived by Method I 
imply. Thus, according to Method I, an original score of 36 is equal to a 
Standard score of 5.0, and an original score of 41 is equal to a Standard score 




188 COMPARATIVE IMPLICATIONS OF THE NORAAAU BELL-SHAPED CURVE 


of 5.5. If a person receives a score of 38 on the test, his Standard score lies 
in the interval between 5.0 and 5.5. This is shown clearly by Method II, in 
which the scores are given by class intervals. 

The Standard score test norms in this table are set up in successive class 
intervals, each of which is equal to one-half standard deviation. This division 
of the scale would theoretically yield 12 intervals between Standard score 
values of 2.0 and 8.0. However, as indicated in Fig. 8:5, the actual distribution 
of the 1838 scores is not bilaterally symmetrical. That is, the tails of the 
distribution are not equidistant from the modal point of the curve; rather, 
the distribution tends to be skewed in the direction of the lower scores. The 
mean of 35.6 is consequently just below the modal interval, which ranges 
from 36 to 38. Furthermore, the maximum possible score that can be made 
on this test is 60, and therefore the tail at the right, or toward the higher 
seores, could not extend relatively as far from the mean as the tail at the 
lower end of the distribution. The actual range of scores of the distribution 
is thus less than the theoretical Standard score range of from 2.0 to 8.0. 
Eleven Standard score intervals, rather than 12, suffice to give a table of 
norms in terms of the actual data obtained from this group of policemen and 
firemen.* 


The Standard Score Profile Chart or Psychograph 

Gentile vs. Standard Score Scales of Test Difficulty 

Gentile scores alone can be used for the development of norms. They arc 
particularly desirable in comparing test results for distributions all of which 
do not tend to be of the normal bell-shaped type. When, however, distribu¬ 
tions whose scores are to be compared are of the normal type, the Standard 
score scale is preferable to the centile scale because of the great concentration 
of cases at the center of the distributions. That is, the centile scale is too 
likely to suggest that differences in test difficulty are the same throughout the 
scale—that the difference in difficulty between Cso and Ceo is the same as that 
between Cso and Cgo. Actually, for most tests, a subject needs to achieve 
success on considerably fewer items to raise his place in the centile scale from 
Cso to Ceo than to raise it from Cgo to C90. 

Another advantage of a Standard score scale over a centile scale is the 
fact that it gives more consideration to the difficulty of the test. This does 
not mean that the standard deviation itself is a basis for differentiating test 
scores on a scale that yields equal units of difficulty. In other words, it does 

* It should be noted that Bennett’s norms for policeman and fireman candidates, pub¬ 
lished with the Mechanical Comprehension Test, are presented in terms of centiles 
(20 intervals), rather than in terms of Standard scores. However, Standard scores can be 
used for somewhat skewed distributions such as that in Fig. 8:5, if their divergence from 
the normal, bell-shaped type of distribution is no greater than might be expected on the 
basis of chance. See chap. 15, Section A, for a statistical test of the possible significance of 
the divergence of a uni-modal, skewed distribution from the normal bell-shaped type. 



USE OF z AND STANDARD SCORES FOR COMPARATIVE PURPOSES 189 


not follow that any two standard deviation intervals are equal in difficulty; 
rather, the Standard score scale is more closely related to the differences in 
difficulty than is the centile scale. However, the spacing of centile intervals 
on a scale can be adjusted to correspond to intervals based on the standard 
deviation. This adjustment is illustrated in the profile charts in Fig. 8:6 
and 8:7. 

Prerequisites for a Profile Chari 

The purpose of a profile chart is to provide a graphic device by which a 
person’s placement on one test can be directly compared with his placement 
on other tests. Thus, a profile chart is a means of summarizing and comparing 
one person’s results on a battery of tests. It shows at a glance whether he did 
well on all the tests, or poorly, or whether his performance was scattered. 
One assumption basic to tlie development of a profile chart cannot be over¬ 
emphasized: The distributions for each test whose scores are compared must 
be derived from the same group of individuals (or, in the case of sampling 
statistics, from samples of the same type or kind of population). In other 
words, a profile chart is developed on the basis of norms that have been 
established for each test whose scores are to be (compared, and the norms in 
turn must be obtained from the same group of subjects. 

The importance of this basic assumption can be brought out by the follow¬ 
ing: Mr. Jones is given two different tests, and his performance on the first 
is compared with his performance on the second by means of the norms 
provided with each test. According to these norms, he has a Standard score 
of 6.0 on the first test and a Standard score of 5.0 on the second. However, 
the norms for the second test were developed from the test results of a re¬ 
stricted group of people who were above average in ability, whereas the 
norms for the first test were developed from the test results of a group that 
was not restricted in the general range of ability. Mr. Jones’ Standard scores 
of 6.0 and 5.0 are therefore not on comparable scales, and they should not 
be plotted on the same profile chart. His Standard score of 5.0 on the second 
test would undoubtedly have been higher had it been based on norms derived 
from the test results of the first group whose range of ability was not so 
restricted. 

The essential characteristic of the profile chart thus is the fact that it 
should provide a standardized matrix that can be used for comparing an 
individual’s placement or position on two or more tests. Unless the norms for 
each test result are derived from the same group or type of population, ambig¬ 
uous, if not bizarre, interpretations will result. Obviously the Standard score 
of one test, based on adult norms, cannot be compared with the Standard 
score of another test, based on norms for 10-year-olds. Nor should norms 
derived from, say, a group of college graduates be used on the same profile 
chart with those derived from adults generally. The point of reference on the 
chart is the center vertical line that represents the mean (Standard score of 5.0) 



190 COMPARATIVE IMPLICATIONS OF THE NORMAL, BELL-SHAPED CURVE 


Rg. 8:6. Individual Psychograph or Profile Chart * 


SUBJECT A 


Tests 


Standard Score Scale 

2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 


Revised Alpha 

Raw 

Score 

S 

Score 

Scholastic Aptitude. 

168 

6.7 

Items Attempted.. 

.168 

6.0 

Per Cent Correct. 

95 

6.3 

Columbia Vocabulary. 

. 83 

5.4 

Potts-Bennet 

Science Information. 

. 83 

6.9 

Paragraph Comprehension.. 

. 24 

6.3 

Specialized Vocabulary. 

35 

6.7 

Speed of Reading. 

. 44 

6.0 

Arithmetic Processes. 

. 27 

6.4 

Arithmetic Reasoning. 

15 

6.3 

General Information. 

. 29 

6.5 

MacQuarrie Mechanical Aptitude 

Motor Speed and Coordination 

.105 

58 

Mechanical Insight . 

. 122 

7.3 

Bernreuter Personality Inventory 

Emotional Stability.. 


._5.6 

Self-Sufficiency... 


..5.1 

Extroversion... 

. 

.5.7 

Dominance. 


..5.5 


Mean 


, Median , 


Centile Scale 

♦ Data for this person, courtesy of Dr. George K. Bennett, The Psychological Corpora¬ 
tion, New York. 

of each test whose scores are to be compared. Only if the means for each test 
are derived from the same group will they be comparable. Similarly, the 
scale of differentiation on a profile chart is established in terms of the standard 
deviation of each test. Only if the standard deviations of each test are derived 
from the same group, whose results approximate normal, bell-shaped distri¬ 
butions, will they be satisfactory for comparisons made in terms of Standard 
scores. 


The Construction of a Profile Chart 

Two variations in the Standard score profile chart are shown in Figs. 8:6 
and 8:7. Both are similar in that the tests whose scores are compared are 
listed chwn the chart on the left-hand side; vertical lines are employed to 
























USE OF z AND STANDARD SCORES FOR COMPARATIVE PURPOSES 191 


Fig. 8i7. Individual Psydiograph or Profile Chart 

SUBJECT B Standard Score Scale 


3.72 4.16 4.75 5.25 5.84 6.28 


Tests 


(3.5) 

(4.0) j 

(4.5) ‘ 

(5.0) 

(5.5) 

16.0) 

(6.5) 


> 2 

§ 

Poor 

Only 

low 

Average 

High 

Good 

Very 


S 8 



Foir 

Average 

Average 

Good 

Revised Alpha 


Di 

02 

D 3 D 4 

D 

t>5|D6 
scile See 

t>7 08 

la 

I>9 

Dio 







Scholastic Aptitude. 

128 

4.8 




X j 




Items Attempted. 

142 

4.6 



X 

1 




Per Cent Correct. 

90 

5.3 




1 

1 

X 



Columbia Vocabulary. 

74 

46 



X 

1 

1 




Potts-Bennet 






1 

1 




Science Information. 

36 

3.2 

X 



1 

1 




Paragraph Comprehension. 

11 

40 


X 


1 

1 




Specialized Vocabulary. 

18 

4.6 



X 

1 

1 




Speed of Reading .. 

29 

4.0 


X 


1 

1 




Arithmetic Processes. 

. 18 

5.4 




1 

1 

X 



Arithmetic Reasoning. 

7 

4.2 



X 

1 

1 




General Information. 

15 

4.2 



X 

1 




MacQuarrie Mechanical Aptitude 






1 

1 




Motor Speed and Coordination. 

.92 

4.9 




1 

X| 




Mechanical Insight.. 

. 72 

4.3 



X 

1 

1 




Bernreuter Personality Inventory 






1 

1 




Emotional Stability. 


..6.9 




1 

1 



X 

Self-Sufficiency.. 


,. 6.8 




1 

1 



X 

Extroversion... 


.. 6.8 




1 

1 



X 

Dominance... 

— 

..7.1 




1 

M 



X- 


Cio Cjo C40 C40 Cfio C90 

Centile Scale 

* Data for this person, courtesy of Dr. George K. Bennett, The Psychological Corpora¬ 
tion, New York. 

mark different points on the test scales; and the basis for these differentiations 
of the test results is the Standard score scale. In Fig. 8:6 the practical range 
of possible test results from a Standard score of 2.0 to one of 8.0 is scaled 
across the top of the chart in equally spaced intervals of one standard devia¬ 
tion each; and rectangles are employed to represent the subject’s placement, 
or relative position, on each of the 17 test variables. Fig. 8:7, on the other 
hand, is so constructed that the D range (from Cio to C90) for the middle 
80% of the cases is enlarged ; descriptive terms are employed for four decile 
intervals (two at each extreme) and for three quintile intervals at the center 




















192 COMPARATIVE IMPLICATIONS OF THE NORMAL, BELL-SHAPED CURVE 


of the scale; and all such intervals are based on the Standard score scale, as 
indicated across the top of the chart.* The individual’s placement on each of 
the variables is represented in Fig. 8:7 by a cross, rather than by a rectangle 
projected laterally from the mean point of reference as in Fig. 8:6. 

The profile chart shown in Fig. 8:7, with descriptive terms for the decile 
and quintile intervals, is probably more widely used than the straight Standard 
score chart shown in Fig. 8:6. It should be emphasized, however, that the 
Standard score scale is used for both. In Fig. 8:7 this device is combined with 
the centile scale. 

Both profile charts represent the results obtained on a battery of tests by 
two women applicants for admission to a school of nursing. Tlie scores of each 
candidate are compared with the results obtained on these 17 tests by 10,000 
applicants for admission to schools of nursing. In other words, these 10,000 
cases constitute the normative group from whose test results on all 17 vari¬ 
ables the Standard score scale was developed for each chart. Thus the basic 
requirement of a profile chart is met, namely, that the Standard scores must 
be based on means and standard deviations of tests all of which have been 
administered to the same group or kind of population. 

It will be observed that less than 17 different tests were administered these 
two candidates, despite the fact that their results on 17 different variables 
are compared. The first test listed, the revised Army Alpha, was scored three 
ways, as indicated, to yield three variables. The Bernreuter Personality 
Inventory was also scored several ways to yield the four variables indicated. 
The Potts-Bennett Tests comprise a series of tests within a battery, each of 
which is scored separately rather than in terms of an over-all score. 

The original scores made by each candidate on each of the first 13 variables 
are given in the first column at the right of the test names. The corresponding 
Standard scores are given in the next column at the right, and the results on 
the Bernreuter Personality Inventory are also indicated in this column. The 
general order in which the groups of variables is presented is somewhat arbi¬ 
trary; however, two or more variables derived from the same test or inventory 
are of course listed together. It is because the general order is arbitrary that 
separate rectangles (as in Fig. 8:6) or crosses (as in Fig. 8:7) are used to denote 
the subject’s position on each variable, instead of the successive crosses being 
connected with straight lines down the page. However, it was the latter type 
of chart that gave rise to the term “profile.” 

It is apparent that the patterns of each applicant’s psychographs differ 
considerably. Thus Subject A (Fig. 8:6) does consistently above average on 
all the variables, and scores “Good” or “Very Good” on all the test variables 
except the Columbia Vocabulary. Her results on the Bernreuter are “Above 
Average” or “Average.” Subject B, on the other hand,* gives a considerably 


* Although the theoretical limits of the first and tenth decile intervals are infinity, they 
are necessarily limited on the profile chart. These limits are .V scores of :^.375 and 6.625. 



USE OF 2 AND STANDARD SCORES FOR COMPARATIVE PURPOSES 193 


more varied psychograph (Fig. 8:7), with a range from “Poor” on the Science 
Information Test to “Very High” on the Bemreuter Personality Inventory. 
Because of her excellent personality ratings, Subject B may prove to be the 
better nurse. The question in her case is whether she has sufficient aptitude 
to take the training required by the particular school of nursing to which she 
applied. If there were a shortage of candidates, she might well be considered; 
but if there were so many candidates that only a small proportion could be 
admitted, the final decision would have to be based on a consideration of the 
results for many other candidates as well as on Subject B’s personality, 
history, and test performance. As for the test performance itself, it is to be 
emphasized that tests do not all have the same degree of reliability or validity 
(cf. Chapter 17, Section A), and their differences in this respect must be 
taken into account. Subject B, for example, has at least a “Low Average” 
rating in most of the more reliable variables. 

The centile equivalents of the Standard score divisions of the psychographs 
in Figs. 8:6 and 8:7 are obtained by reference to Table 8:1. In order to deter¬ 
mine the Standard score value that will be at Cgo on the scale, the score value 
is located at a point that divides the distribution into two parts, with 10% 
beyond C90. This is the point that marks the limit of 40% of the area above 
the mean. We locate in Table 8:1 the nearest value to 40, and we find it in 
the row for x/<r = 1.2 (or a Standard score of 6.2, since aS = z + 5.0 = 
1.2 + 5.0 = 6.2), and the next to the last column at the right, headed .08. 
S is therefore 6.28, or 6.3 when rounded to one decimal place. 

The necessary values for the divisions used in Fig. 8:7 are summarized as 
follows: 


Cgo or better : S = 6.28 or better ; tenth decile —“Very Good” 

.0 _ OA 00— . _ 


Cgo to C90 
Cgo to Cgo 
C40 to Cgo 
C20 to C40 
Cio to C20 
Less than Cio 


5 = 5.84 to 6.28- 
S = 5.25 to 5.84- 
S = 4.75 to 5.25- 
.S = 4.16 to 4.75- 
5 = 3.72 to 4.16- 


ninth decile —“Good” 

; fourth quintile—“High Average’ 
; third quintile —“Average” 

; second quintile—“Low Average” 
; second decile —“Only Fair” 


S = Less than 3.72; first decile —“Poor” 


EXERCISES 

1. Summarize the essential properties and implications of the normal bell-shaped 
distribution. 

2. What properties do a rectangular distribution and the normal bell-shaped dis¬ 
tribution have in common? 

3. Why are the “practical limits” of the normal bell-shaped distribution taken as 
equal to plus and minus three standard deviation units from the mean? 

4. What fundamental purpose in measuring people’s abilities is served by z scores? 

5. Under what circumstances can z scores be unambiguously interpreted in terms of 
centile intervals? 



194 COMPARATIVE IMPLICATIONS OF THE NORMAL, BELL-SHAPED CURVE 


6. Compute the z score equivalents of the following original scores for a distribution 
whose mean is 85.3 and whose standard deviation is 14.5: 

a. an original score of 92.0 

b. an original score of 46.7 

c. an original score of 112.0 

d. an original score of 85.4 

e. an original score of 58.0 

7. On the assumption that the distribution in the preceding exercise is of the normal, 
bell-shaped type, indicate the centile intervals in which each of the computed 
z scores lies. 

8. Why is the value of the standard deviation of a distribution always larger than 
the value of the average deviation? 

9. Convert the z scores obtained in Exercise 6 to Standard scores. 

10. On the assumption that a distribution is of the normal bell-shaped type, and that 
its mean is 125.0 and its standard deviation 30.0, determine original score values 
of the following Standard scores: 

a. 5.2 d. 4.1 

b. 2.5 e. 6.3 

c. 7.6 

11. On the assumption that the distribution of the Bennett Mechanical Comprehen¬ 
sion test scores in Fig. 8:5 is normally distributed, determine the original score 
equivalents of the following: 

a. a Standard score of 6.2 

b. a z score of —1.1 

12. What fundamental research purpose is served by an individual psychograph or 
profile chart? 

13. On what assumptions is the use of individual profile charts based? 

14. Devise a graphic method for comparing on the same psychograph the individual 
results of the two persons whose psychographs are presented in Figs. 8:6 and 8:7. 



CHAPTER 9 


The Product-Moment Method * for the 
Correlation of Variates 

A. THE LINEAR CORRELATION OF BI-VARIATES 

The lengths of the radii and circumferences of circles may vary; however, ’ 
the association between the length of the radius and the circumference of 
circles is such that the relation between them is perfect. That is, any known 
variation in the length of the radius of a circle is accompanied by a definite 
amount of change in the length of the circumference of a circle. This rela¬ 
tionship is expressed by 

Q 

C = 2irr, or radius = — 

2ir 

Similarly, the height and width of the sides of a square are perfectly related. 
If the height of a square equals y and its width equals x, then x is always equal 
to y. Knowing the length of either side, we can compute, without error, the 
length of the other. Thus, 

y = ar, or a; = y 

These are examples of relations which are perfect; that is, the relations are 
such that there is no variation in the length of the circumferences of circles 
with a given size of radius and no variation in the length of the side of squares 
with a base of a specified length. 

In contrast, the relations characteristic of biological and social phenomena 
are not perfect. Co-relations between attributes or aspects of such phenomena 
are expressions of some degree of co-association or co-variability. Such co¬ 
relations may range from no correlation at all to values approaching perfect 
correlation. If the phenomena of the biological and social sciences were as 
strictly related as the mathematical properties of geometric figures, there 
would never have been any occasion for the development of statistical 
methods. It was because observation and measurement showed such rela¬ 
tionships to be variable that the special techniques of applied mathematics, 
i.e., statistical methods, were developed. Historically, of course, the discovery 
was not so much finding that the co-relations of natural and social phenomena 
are variable as it was finding that these apparently chaotic and very complex 

* This method of correlation is called the producl-momerU method because it is based on 
the method of moments (mean and standard deviation) described in chap. 7. 

195 




196 THE PRODUCT-MOMENT METHOD FOR THE CORRELATION OF VARIATES 

phenomena are less chaotic than originally imagined. As was indicated in 
Chapter 1, Quetelet and Galton were especially instrumental in making 
systematic oteervations and developing methods that served to describe the 
“law and order” characteristic of many aspects of these kinds of events. 

The study of the possible relationships between two or more attributes or 
characteristics has been integral to the development of both the biological 
and the social sciences because it is through such studies that problems of 
law and causation in these fields have been opened to investigation. The 
analysis of laws and causal relations among variable phenomena is basically 
dependent upon the method of correlation. The statistical problem is one of 
employing a mathematical technique that will yield a satisfactory index of 
the degree of co-variation. Casual observation may suggest that short people 
weigh less than tall people and that tall people weigh more than short people, 
or, conversely, that people who weigh less tend to be shorter than people 
who weigh more. However, casual observation also indicates that the rela¬ 
tionship between the height and weight of individuals is not perfect. The 
problem therefore is one of determining the degree of co-variation in height 
and weight. As shown in Chapter 4, the degree of co-variation between two 
co-related variables is expressed mathematically by the correlation coefficient. 
Such a coefficient, whose value depends upon the degree of co-variation mani¬ 
fest in the relation of two variables, may range from 0 (no correlation at all) 
to a value approaching +1*0 or —1.0 (in other words, apjNroaching perfect 
correlation). 

Casual observation also indicates that there is some correlation between 
the height and age of persons during the period of growth, but that after 
maturity there is likely to be very little correlation between these two vari¬ 
ables. Psychological research has revealed that there is some degree of corre¬ 
lation between people’s performances on psychological tests and their actual 
behavior in educational or working situations. The use of the Army classifi¬ 
cation tests is based on the well-tested observation that there is a relation 
between achievement on these tests and performance in many types of Army 
training and occupational situations. Similarly, research has clearly indicated 
that there is some degree of correlation between personality appraisals of 
individuals and their capacity to succeed in various types of training activities, 
such as dive bombing. 

Correlation studies of the relationships between two attributes or factors 
are usually made for the purposes and problems of sampling and analytical 
statistics rather than for descriptive statistics alone. However, the degree of 
possible correlation between the empirical data of two variables is a descriptive 
problem. It is for this reason that methods for estimating and computing the 
degree of correlation between two variables are presented in the first part 
of this book. Later, in Chapters 16-18, we shall present implications of and 
procedures for correlation methods employed in studying populations through 
the analysis of sample data. 



197 


THE UNEAR CORREUTION OF Bi-VARIATES 

Pearson’s Product-Moment r 

The product-moment correlation coefficient, developed by Karl Pearson 
and symbolized by r, is often referred to as the Pearson r. It is equal to the 
ratio of the mean of the products of the paired deviations for the two vari¬ 
ables correlated to the product of their respective standard deviations. Thus: 


S(*y) 


[9,1] 

N 

^ixy) 

Pearson’s product-mo¬ 


ment correlation coeffi¬ 

ax<Ty 

Nax(Ty 

cient (r) 


Before describing methods for the computation of r (Section C of this chapter) 
we shall give the background of its development and some of the considera¬ 
tions on which it is based, and present a graphic method by which it can be 
obtained. The latter will in particular serve to illustrate what r means, 
regardless of the method used to determine it. 

The Cross-Tabulation of Bi-Variate Data 

We saw in Chapter 4 that cross-tabulation of the data of two attributes is 
essential to studying the possible correlation between them. This is generally 
the case, whether the co-relationships being investigated are for the data of 
non-variable attributes or for the data of variates. The cross-tabulation of 
the data of bi-variates involves basically the same procedure as the cross¬ 
tabulation of the data of dichotomized attributes into a fourfold table. The 
chief difference is the fact that the cross-tabulation of bi-variate data is made 
in reference to a correlation matrix that is set up for continuously distributed 
variables. The simplest method of determining by inspection whether there is 
any noticeable correlation between two variables is to cross-tabulate the data 
into a graph known as a scattergram. 

The Scattergram of Bi-Variate Data 

A scattergram represents a cross-tabulation of bi-variate data that is usually 
made directly from the original data of each variable. That is, the data of 
the two variables correlated are not grouped into a limited number of class 
intervals but are plotted directly from the original measurements. 

A scattergram that describes a fairly high degree of positive correlation is 
presented in Fig. 9:1. The matrix used for plotting a scattergram is the 
coordinate axes of a geometric field. The observed values of one variable are 
scaled on the a-axis, or abscissa. The observed values of the other variable are 
scaled on the y-axis, or ordinate. When the two variables being correlated 
are known or presumed to be causally related in a way such that variations 
in one variable are in some way and to some degree dependent upon variations 
in the other variable, the latter is called the independent variable, and is scaled 
on the x-axis. The former is called the dependent variable and is scaled on 
the y-axis. 



198 THE PRODUCT-MOMENT METHOD FOR THE CORRELATION OF VARIATES 


In the height-weight data in Fig. 9:1, the height measurements have been 
scaled on the ar-axis and the weight measurements on the y-axis. However, 
this should not be interpreted as necessarily implying that height is an inde¬ 
pendent variable such 

Scattergram of the Heights and Weights of 
One-Year Old Girl Infants * 


Fig. 9:1. 


that weight is depend¬ 
ent upon it. Height and 
weight are both attri¬ 
butes or qualities of or¬ 
ganisms, and hence any 
relationship between 
them is a function of 
the fact of organic unity. 
But the height or length 
of organisms is often 
thought of as being a 
more fundamental aspect 
or quality of the organ¬ 
ism than is weight. For 
one thing, the weight of 
organisms is affected by 
environmental circum¬ 
stances of growth and 
of living more than is 
their height. Neverthe¬ 
less, such a dijQference 
does not in itself indi¬ 
cate that height is an in¬ 
dependent variable and 
weight dependent on it. It would not be inaccurate, therefore, to scale the 
height measurements on the y-axis and the weight measurements on the 
x-axis. 



* Data from . J. G. Peatman and R. A. Higgons, 
“Growth Norms from Birth to the Age of Five Years: A 
Study of Children Reared with Optimal Pediatric and 
Home Care/’ American Journal of Diseases of Children, 
44:1233-1247,1938. 


Whichever way any two variables are scaled, the result should not carry 
the implicit assumption that variations in the quality or factor scaled on 
the y-axis are dependent upon variations in the factor or quality scaled on 
the x-axis. The choice of axes for the two variables being correlated is usually 
purely arbitrary, and any evaluation of the result with respect to the problem 
of causality must be based upon information about the nature of the variables 
themselves, rather than upon the way in which they happen to be scaled on 
the coordinate axes. 


Laying off the Scales of a Scattergram 

A standard procedure is usually followed in scaling the measurements of 
each variable for a scattergram. Although the procedure can be varied, follow¬ 
ing the standard practice gives a descriptive picture that is unambiguous in 





THE UNEAR CORREUTION OF BI-VARIATES 


199 


its implications. The standard procedure has already been described in 
Chapter 4, in connection with the correlation of the dichotomized data of 
bi-variates. As we saw there, a cross-tabulation of correlational frequencies 
is less likely to be interpreted ambiguously if the scales of each variable are 
laid off so that the result corresponds to the implications of a geometric field. 
A geometric field is a 

matrix which is divided ^*2. The Four Quadrants of a Geometric Field 

into four quadrants by 
the co-ordinate axes. 

These quadrants are es¬ 
tablished for correlation 
by the intersection of 
lines drawn perpendicu¬ 
larly from the respective 
mean values of each Mean of 
scale. Such a relationship 
is illustrated in Fig. 9:2. 

The upper right-hand 
quadrant of a geometric 
field is positive, because 
high values of one vari¬ 
able associated with high 
values of the other vari¬ 
able are located in this 
part of the total field. 

This is usually designated 
as Quadrant I. Similarly, low values of one variable associated with low values 
of the other are located in the lower left-hand quadrant, designated as Quad¬ 
rant III. This quadrant is likewise positive because the variations, or deviations, 
of measures from the mean of each variable are negative, and the product of 
paired negative deviations is positive. The remaining two quadrants, II and IV, 
are both negative. Any paired observations located in Quadrant II, in the upper 
left-hand corner of the figure, represent instances in which the measures of 
the X variable are less than the mean of x, and the measures of the y variable 
are greater than the mean of y. Conversely, any paired observations located 
in Quadrant IV represent instances in which the X values are greater than the 
mean of x, and the Y values are less than the mean of y. 

In order, therefore, for a scattergram to yield a result whose implications 
correspond to the positive character of Quadrants I and III and the negative 
character of Quadrants II and IV, the measures of each variable must be 
scaled as follows: 

1. The measures of the x variable are scaled on the abscissa, beginning with 

the smallest values at the left side of the scale and ending with the largest 





200 THE PRODUCT-MOMENT METHOD FOR THE CORRELATION OF VARIATES 


values at the right side of the scale. (In the height measures in Fig. 9:1, the 
least height was 27 inches and the maximum height was 321 inches.) 

2. The measures of the y variable are scaled on the ordinate, beginning with 
the smallest values at the bottom of the scale and ending with the largest 
values at the top of the scale. (In the weight data in Fig. 9:1, the least 
weight was 16| pounds and the maximum weight was 28 pounds.) 

When the measures of two variables being correlated are scaled in this 
manner, the scattergram gives a result that associates paired high values in 
positive Quadrant I, paired low values in positive Quadrant III, and low 
values of one variable with high values of the other variable in the negative 
Quadrants II and TV. Actually, of course, correlation is not simply a question 
of whether values are large or small; rather, the first step in the problem is the 
location of paired deviationsy taken from the means or their respective vari¬ 
ables. In other words, positive deviations of both variables are associated in 
Quadrant I; negative deviations of both variables are associated in Quadrant 
III; and negative deviations of one variable are associated with positive 
deviations of the other variable in Quadrants II and IV. 

In psychological measurement there is a logical exception to scaling magni¬ 
tudes in the order just described for the ar- and y-axes. This exception holds 
when either one or both of the two variables being correlated yield measures 
such that the higher values mean less psychological ability or capacity, and 
the lower values mean greater psychological ability or capacity. Such a result 
typically appears in any measure of ability based upon the time required to 
achieve or complete a given number of tasks or items. Under these circum¬ 
stances, the person with the smallest test score (in terms of time required) 
manifests psychologically greater ability than the person with the highest 
score (in terms of time required). In such cases the order of the scores is 
usually reversed so that the direct psychological implications of the scatter¬ 
gram will correspond to the usual meaning of a geometric field. 

In practice, the means of each variable are often not drawn on a scatter¬ 
gram, because a scattergram, as ordinarily used, is a preliminary device to 
depict such correlation as may be present between two variables and is often 
made before the means are computed. If, however, the means of each variable 
are available, it is best to draw the intersection of the axes as projections 
from these means, as was done for the data in Fig. 9:3. It is to be observed 
that the great majority of the paired measures in this scattergram are in the 
negative quadrants (II and IV). Only 13 cases are located in positive Quad¬ 
rant I, and only 9 in positive Quadrant III.* 


* Many years ago Sheppard, the English statistician, developed a geometric method for 
estimatitig a correlation coefficient from the ratio of frequencies in the four quadrants of 
the geometric field. The methods of tetra^oric correlation, described later in the next 
chapter, and phi correlation (chap. 4) are based upon ratios of positive and negative corre¬ 
lational frequencies. 



THE UNEAR CORRELATION OF BI-VARIATES 


201 


As already indicated, the principal use of a scattergram is to give a graphic 
picture of the correlation between two variables, with respect to both degree 
and nature of the relationship. In the height-weight measures in Fig. 9:1, the 
degree of relationship is fairly well marked; the nature of the relationship is 
positive and appears to be linear. An inspection of the dots on the scattergram 
will bear out this statement. 


The Assumption of Linear Correlation 


The co-relation between any two variables can be described as either linear 
or non-linear. The product-moment method of correlation is based upon the 
assumption that such re¬ 
lationship as exists be- ^ Scattergram That ^Illustrates Negative 

tween two variables can Correlation * 


be adequately described 
by a linear rather than 
a curvilinear function. In 
other words, the method 
is based upon a straight- 
line relationship. This 
means that for each suc¬ 
cessive change or differ¬ 
ence in the measures of 
one variable, there is a 
proportionate change of 
a constant amount in the 
other variable. The direc¬ 
tion of the change may 
be negative or positive. 
The term rectilinear cor¬ 
relation is synonymous 



with linear correlation. 


X variable 


By contrast, curvilinear 
correlation occurs when 
the changes in the re¬ 
lationships are not of a 


* Note that the correlation is linear and that as the 
degree of correlation increases, whether negative or posi¬ 
tive, the scatter tends to form the shape of an ellipse that 
becomes increasingly narrow. The scatter of zero correla¬ 
tion tends to form a circle; cf. Fig. 9:4. 


constant, proportionate 

amount, and consequently cannot be described adequately by a straight line. 

Linearity of a co-relation is obviously manifest when the scatter of the 
paired measurements of bi-variates clusters about an imaginary straight line. 
That the correlation shown in Fig. 9:1 is linear is apparent. A straight line 


can also readily be applied to the scatter in Fig. 9:3, illustrating negative 
correlation. However, in Fig. 9:4, illustrating zero correlation, it may not be so 
apparent that a line of any kind can be fitted to the scatter of the data. There 
is obviously no correlation between the two variables. That is, low values of 





202 THE PRODUCT-MOMENT METHOD FOR THE CORRELATION OF VARIATES 

Fig. 9:4. A Scattergram That Illustrates Zero Cor- the x variable are asso- 
relation dated with as many high 

Mx as low values of the y vari¬ 

able, and, similarly, high 
values of the x variable 
are associated with as 
many low as high values 
of the y variable. 

In order to fit a line 
that will describe the trend 
of non-correlation, the 
average variation in the 
values of one variable 
must be plotted with re¬ 
spect to successive values 
of the other variable. 
This has been done in 
Fig. 9:5 (with the data 
from Fig. 9:4) for the va¬ 
riations of the y variable 
associated with successive 
class-interval values of the x variable. It is now apparent that a horizontal 
straight line can be fitted 

to the data. The shpe of ^*9- Average Variation of Measures of y 

this line is zero. '*'*♦*’ * * 

The method of product- ** * 

moment correlation which 
yields the coefBcient r 
(see Formula 9:1) is based 
upon the assumption that 
the co-variation observed 
in a set of observations 
can be adequately de- j* 
scribed by a straight-line 2 
function. At this point, > 
we wish to emphasize that ^ 
the making of a scatter- 
gram (or correlation chart, 
see page 207) is important 
in any study of the corre¬ 
lation between two vari¬ 
ables, because the graph 
itself provides a picture x variable 

that shows whether such * Yhig illustrates the Fact that the Zero Correlation 
relationship as may exist of Fig. 9:4 is I incar. 
















THE LINEAR CORRELATION OF BhVARIATES 


203 


is in fact satisfied by a linear function. If it is, the product-moment method 
of correlation is appropriate. But if the co-relationship shown in the scatter- 
gram differs considerably from a linear type of relation, non-linear methods 
of correlation must be 

Fig. 9:6. A Scattergram That Illustrates Non-Linear 
Correlation 




• • 

• • 




employed. 

In practice, because of 
a research worker’s fa¬ 
miliarity with many types 
of data, it is frequently 
safe to assume that the 
co-relationship is linear. 

In such instances, ma¬ 
chine methods for com¬ 
puting the correlation 
coefficient may be war¬ 
ranted, despite the fact 
th at they provide no cross- 
tabulated picture of the 
actual result. But when¬ 
ever the variables being 
correlated are unfamiliar, 
or there is any question 
as to the nature of the 
possible co-relationship, a 
scattergram should be 

made to determine whether the product-moment method is appropriate for 
computing the correlation coefficient. The data in Fig. 9:6 represent a case in 
point. Here the paired observations do not scatter along a straight line nearly 
so well as along a curved line. If the product-moment method were used to 
express the degree of correlation between these two variables, the result 
would be very misleading. The correlation coefficient, r, would be considerably 
lower in value than a coefficient derived by a method that takes into account 
the curvilinear character of the association between two variates. 


• • 

• • 


Plotting the Bi-Variate Data of a Scattergram 

The data upon which the scattergram in Fig. 9:1 are based are presented in 
Table 9:1. The height-weight measurements of the first infant. No. 1, are 
31.0 inches and 25 pounds respectively. This particular correlational fre¬ 
quency is designated on the scattergram in Fig. 9:1. In plotting each correla¬ 
tional frequency, the height measure is first located on the height scale at 
the bottom of the scattergram, and the weight measure is then located on the 
vertical scale at the left. Imaginary lines perpendicular to each scale are then 
projected from each of these two scale values into the geometric field, and a 
dot is made at their point of intersection. Each dot on a scattergram, there¬ 
fore, represents a correlational frequency whose y and x values can always be 





204 THE PRODUCT-MOMENT METHOD FOR THE CORRELATION OF VARIATES 


Table 9:1. Height and Weight Measurements of One-Year-Old Girl Infants 


Cas« 

Number 

Height 
(In in.) 

Weight 
(In Lbs.) 

Case 

Number 

Height 
(In In.) 

WalgM 
(In lb>.) 

Case 

Number 

Height 
(In in.) 

Waight 
(In lbs.) 

1 

31.0 

25.0 

51 

28.0 

20.0 

101 

29.0 

20.0 

2 

30.0 

24.5 

52 

29.5 

20.0 

102 

31.0 

26.0 

3 

29.5 

21.0 

53 

28.5 

20.0 

103 

31.0 

24.0 

4 

29.5 

26.0 

54 

27.5 

19.0 

104 

32.5 

26.5 

5 

27.5 

20.0 

55 

29.0 

27.0 

105 

28.5 

20.5 

6 

30.0 

22.0 

56 

30.5 


106 

29.5 

18.5 

7 

30.0 

21.0 

57 

31.0 

23.5 

107 

30.0 

21.5 

8 

30.0 

23.5 

58 

29.0 

21.0 

108 

28.0 

19.5 

9 

29.0 

18.5 

59 

28.0 

20.0 

109 

30.0 

24.0 

10 

29.0 

22.0 

60 

30.0 

22.5 

no 

29.5 

22.5 

11 

29.0 

23.5 

61 

29.0 

23.0 

111 

31.5 

23.0 

12 

31.5 

25.0 

62 

29.5 

23.0 

112 

29.5 

19.0 

13 

29.0 

21.5 

63 

28.0 

18.0 

113 

30.5 

19.5 

14 

29.5 

20.5 

64 

29.0 

20.5 

114 

29.5 

22.5 

15 

28.5 

21.0 

65 

29.0 

24.0 

115 

28.0 

17.5 

16 

29.5 

23.0 

66 

30.5 

22.0 

116 

29.5 

19.0 

17 

32.0 

24.0 

67 

30.0 

24.0 

117 

32.0 

26.0 

18 

30.0 

19.5 

68 

29.0 

18.5 

118 

30.5 

22.5 

19 

28.0 

19.0 

69 

29.5 

21.5 

119 

27.0 

17.5 

20 

30.0 

25.0 

70 

30.0 

20.5 

120 

28.5 

18.0 

21 

28.5 

20.0 

71 

28.5 

19.5 

121 

30.5 

24.5 

22 

29.0 

21.5 

72 

30.5 


122 

29.5 

20.5 

23 

29.0 

21.0 

73 

29.0 



28.0 

19.5 

24 

30.0 

25.0 

74 

29.0 


124 

28.0 

18.0 

25 

30,5 

22.0 

75 

29.0 


125 

27.5 

17.0 

26 

29.5 

23.0 

76 

29.5 


126 

30.5 

19.5 

27 

30.0 

22.0 

77 

30.5 

21.5 

127 

29.5 

18.0 

28 

29.5 

23.0 

78 

27.0 

20.0 

128 

31.5 

23.0 

29 

28.5 

22.5 

79 

29.0 

19.0 

129 

28.5 

19.5 

30 

29.0 

20.5 

80 

29.5 

24.0 

130 

27.5 

17.5 

31 

29.0 

22.5 

81 

29.0 

20.0 

131 

29.5 

19.5 

32 

30.5 

25.5 

82 

29.0 

18.0 

132 

30.0 

20.0 

33 

29.0 

21.0 

83 

29.5 

25.0 

133 

29.5 

22.5 

34 

28.5 

22.0 

84 

28.0 

19.5 

134 

30.5 

23.0 

35 

29.0 

19.0 

85 

30.0 

20.5 

135 

29.5 

20.5 

36 

30.0 

21.5 

86 

31.5 

23.5 

136 

27.5 

20.5 

37 

29.0 

20.0 

87 

30.0 

22.5 

137 

32.5 

28.0 

38 

29.5 

21.0 


30.0 

24.0 

138 

30.5 

27.0 

39 

29.0 

21.5 

89 

31.5 

21.0 

139 

30.5 

27.5 

40 

28.0 

21.0 

90 

30.5 

28.0 

140 

28.5 

20.5 

41 

29.0 

23.5 

91 

30.0 

21.0 

141 

28.5 

21.5 

42 

31.0 

20.0 

92 

29.0 

17.5 

142 

29.5 

22.5 

43 

28.5 

22.0 

93 

31.0 

25.5 

143 

30.5 

25.0 

44 

27.0 

17.0 

94 

27.0 

17.0 

144 

32.5 

25.0 

45 

29.0 

23.0 

95 

29,0 

19.5 

145 

28.5 

17.5 

46 

29.0 

22.5 

96 

30.5 

22.0 

146 

28.5 

20.5 

47 

29.0 

19.0 

97 

29.5 

23.0 

147 

27.5 

16.5 

48 

29.5 

23.0 

98 

29.5 

22.0 

148 

30.5 

26.0 

49 

31.0 

24.0 

99 

29.0 

20.5 

149 

28.5 

18.0 

50 

28.5 

21.0 

100 

29.5 

23.5 

150 

28.0 

19.5 







151 

23.5 

20.5 

































THE LINEAR CORREUTiON OF BI-VARIATES 


205 


obtained by referring to the scales of the coordinates. Thus, the dot for 
Infant No. 1 represents (reading up) a height measure of 31.0 inches and 
(reading across to the left) a weight measure of 25 pounds. 

The second infant’s height and weight measures, as given in Table 9:1, are 
represented on the scattergram by a dot at the intersection of imaginary lines 
projected from a measure on the height scale equal to 30.0 inches and a 
measure on the weight scale equal to 24.5 pounds. The paired data of the 
remaining 149 correlational frequencies are plotted in turn, to give the scatter¬ 
gram shown in Fig. 9:1. Sometimes, in making a scattergram, two or more 
correlational frequencies will have the same x-variable value and the same 
y-variable value. In this case, additional dots to represent the location of such 
correlational frequencies are plotted in close proximity to the first dot, rather 
than on top of it. 

The Correlational Frequency: Paired Associates 

Consideration of the kinds of statistical situations which lend themselves 
to the method of correlation should make clear the fundamental nature of 
bi-variate data. Essentially, the data of correlations are derived from the 
measurements of two attributes or traits that are logically associated by means 
of a group of paired obsermtioas. Each pair constitutes a correlational fre¬ 
quency, and unless observations of the variates of two attributes or traits 
have a factual basis for association by pairs, tlie method of correlation is not 
applicable or relevant. 

The costs of advertising two products can be compared but they cannot be 
correlated unless there is a group of paired costs that can be cross-tabulated. 
The basis for associating such a group by pairs in the field of market research 
is often calendar time. For example, the costs of advertising two products can 
be cross-tabulated for a series of paired costs obtained for successive annual 
or semi-annual periods. Similarly, time is the basis for pairing observations 
of two variables that may not otherwise appear to have any relationship 
with each other. In fact, time is made the basis for pairing observations often 
used to illustrate spurious * correlation, as for example, a correlation between 
the annual precipitation in New Zealand and the birth rate in Wisconsin. 
Precipitation and birth rate for the same years can be paired; consequently, 
such data over a period of twenty-five or fifty years would constitute a group 
of 25 or 50 correlational frequencies paired on the basis of time. In this case, 
time would furnish the only basis for the associations by pairs; there is no 
other logical or reasonable basis for associating two such variables. 

In psychology, biology, and the social sciences, the basis for correlational 
frequencies is usually the individual organism. The paired associates for the 
correlation between two variables such as height and weight, intelligence and 

* Spurious correlation means that correlation obtained between two variables is in 
whole or in part due to factors other than those to which it is ascribed. 



206 THE PRODUa-MOMENT METHOD FOR THE CORRELATION OF VARIATES 

educational achievement, aptitude and personality ratings, attitude for A 
and attitude for B, etc., are obtained in each instance from the measurements 
or observations of the same persons. A group of individuals yields a number of 
paired associates and hence provides relevant data for cross-tabulation and 
correlation. 

Blood relationship or social relationships of various kinds also yield paired 
associates, and hence a group of data that can be correlated. Galton’s original 
correlational studies of inheritance were made from anthropometric data 
that were paired for parent and offspring. E. L. Thorndike and others have 
studied the relationship between the intelligence of siblings—^brother-brother, 
brother-sister, or sister-sister pairs. These are examples of bi-variates derived 
from genetically related pairs. On the other hand, the possible relationship 
between intelligence, scholastic achievements, personalities, or interests, of 
best friends, husbands and wives, etc., has been studied by the method of 
correlation. Intelligence test scores of individuals may be paired with the 
intelligence test scores of their best friends; attitude scores of husbands may 
be paired with the attitude scores of their wives. A group of such paired asso¬ 
ciates is thus an example of bi-variates derived from the data of socially 
related pairs. In correlational problems such as these, a single attribute or 
trait is usually under consideration. That is, the data of each correlational 
frequency are measurements or observations of the same quality (as for 
example, measurements of intelligence, all of which are derived from the same 
test), but they are paired for genetically or socially related persons. 

Some experimental situations in psychology and related fields, especially 
biology, also give rise to paired associates. This is particularly true of the 
experimental method of equated groups, in which the subjects of both the 
experimental and the control groups are matched, pair by pair. As we shall see 
later (Chapter 14, Section F), the correlation between the results for experi¬ 
mental and control groups, individually matched by pairs, is relevant for 
testing the significance of the mean difference between two such groups. 

Two groups of data can often be compared with respect to their central 
tendencies, deviational tendencies, the form of their respective distributions, 
etc. But two variables cannot be correlated unless there is a logical or reason¬ 
able basis for cross-tabulating the data of each variable. Thus, the scholastic 
achievements of seniors and juniors in a college can be compared but not 
correlated, unless there is a meaningful basis for associating a particular 
junior student with a particular senior student, whereby a series of correla¬ 
tional frequencies can be established. 

The Correlation Chart 

A picture of the distribution of correlational frequencies can also be ob¬ 
tained from a correlation chart. Each of the correlational frequencies in such 
a chart is represented not by a dot, but by a tally or by the total number of 
such frequencies in each cell of the matrix. The result is sufficiently similar to 



THE LINEAR CORRELATION OF BI-VARIATES 


207 


a scattergram to be almost 
as satisfactory as the lat¬ 
ter in depicting the bi¬ 
variate relationship. Fur¬ 
thermore, the correlation 
chart has the advantage 
over the scattergram in 
that it is set up in such a 
way that the correlation 
coefficient itself can be 
directly computed from 
the cross-tabulated data. 

Figs. 9:7 and 9:8 por¬ 
tray the height-weight 
data in Fig. 9:1 cross- 
tabulated into correlation 
charts. Fig. 9:7 shows a 
tally of the correlational 
frequencies. Fig. 9:8 rep¬ 
resents the same result, 
but the correlational fre¬ 
quencies of each cell are 


Fig. 9:7. The Correlation Tally * 



* Height-Weight Data of Table 9:1. 


summed, and the totals for each cell are indicated. The basic purpose of a 


Fig. 9:8. The Correlation Distribution ' 


















































/ 

/ 



/ 

/ 

6 




/ 

/ 



/ 

/ 

3 



2 









Z 3 2 ! f 

,32^1 3 / / 

I 10 Z Jf Z 

_ 

6 $ _/_ 

_ L _ 

5X3 


‘Op‘Op‘Op‘Op‘Op‘OP'0 

<i|s^N!o6ado«> 

CMCNCNCMCMCM (NcOCOOOCOO 

Height in Inches 
* From Tally of Fig. 9:7, 


scattergram is well served 
by either the correlation 
tally chart in Fig. 9:7 or 
the correlation frequency 
chart in Fig. 9:8. That is, 
both figures show (1) that 
there is a fair degree of 
correlation; (2) that the 
correlation is positive ; and 
(8) that the bi-variate re¬ 
lationship is linear one, 
that is, it can be ade¬ 
quately described by a 
straight-line function. 

A correlation chart is 
constructed by procedures 
similar to those used in 
setting up the matrix for 
a scattergram. The chief 
difference is the fact that 
the data of each variable 





208 THE PRODUCT-MOMENT METHOD FOR THE CORRELATION OF VARIATES 


are grouped into class intervals characteristic of frequency distributions. Thus 
in Figs. 9:7 and 9:8, the height measurements have been distributed into 
12 class intervals, each equal to half an inch. The weight data have likewise 
been distributed into 12 class Intervals, each equal to one pound. The use of 
12 class intervals for each of the two variables thus produces a correlation 
chart or matrix with 144 cells. Instead of only a 2 by 2 or 2 by 4 type of cor¬ 
relation table described in Chapter 4, we have, in the correlation of continu¬ 
ously distributed variables, 12 by 12-fold tables, or some other combination 
that produces a large number of cells. In practice, as was indicated in the 
development of frequency distributions in Chapter 5, no more than 20 class 
intervals for a variable need to be employed. And if fewer than 12 class 
intervals are used, it may be advisable to apply Sheppard’s correction * to the 
standard deviation of each variable before r is computed. 

The more marked tlie correlation between the data of two variables, the 
more cells there will be with no correlational frequencies. In positive correla¬ 
tions there will be more cells in the negative quadrants (II and IV) with no 
frequencies, and in negative correlations there will be more cells in the positive 
quadrants (I and III) with no frequencies. 

The correlation tally in Fig. 9:7 was made directly from the original data in 
Table 9:1. The height-weight measures for Infant No. I are represented by 
the tally in the proper cell in Quadrant I. Similarly, the height-weight measures 
of the fifth infant are represented by the tally in the proper cell in the third 
quadrant. The distribution of the correlational frequencies in Fig. 9:8 is 
obtained from the tally of cross-tabulations in Fig. 9:7. In order to facilitate 
locating the correlational frequencies in Fig. 9:8, the mid-points of the class- 
intervals of each variable are used. Mid-point values, rather than the class 
limit values, are more relevant for further computational work with the data 
of a correlation chart (cf. Figs. 9:9 and 9:12), because all computations are 
based on the principles described in Chapter 7 for the mean and the standard 
deviation. That is, the mid-point of each c*Jass interval is assumed to be a 
representative value for all frequencies within the limits of the interval. 

B. ESTIAAATION OF PRODUCT-MOMENT r 

Before presenting methods for computing the product-moment correlation 
coefficient, we shall illustrate the mathematical implications of a computed r 
by estimating the coefficient from regression lines fitted to a group of bi¬ 
variate data. 

Fitting Linear Regression Lines to Bi-Variate Distributions 

The cross-tabulated relationship (whether zero, positive, or negative) 
between two variables, as illustrated by the correlation charts in Figs. 9:7 


* For Sheppard’s correction, see p. 167. 



ESTIMATION OF PRODUCT-MOMENT r 


209 


and 9:8, represents a bi-variate distribution. It has already been observed 
that the bi-variate data in Fig. 9:8 are distributed in such a manner that a 
straight Urn can be used to describe the nature of the relationship. 

Mathematically, two problems arise in treating bi-variate distributions in 
which two attributes or qualities, x and y, are not invariantly related. If there 
were perfect correlation, there would be no particular mathematical problem, 
because x could be expressed as an invariant function of y, and y could be 
expressed as an invariant function of a*. But, as we saw earlier, the statistical 
problem of correlation arose because bi-variate relations for phenomena of the 
biological and social sciences are not perfect or invariant. Hence, the first 
aspect of the mathematical problem is to find the best algebraic formulation 
that will express the relationship between two variables, x and y. The second 
aspect consists in developing a method that will denote the degree of correla¬ 
tion between two such variables. 

One of the simplest types of algebraic relationships is the straight-line 
function. This is a linear equation that may be expressed as follows: 


y = mx -f k 


[9:2] 

Linear equation (y on x) 


where m is a constant denoting the slope of the straight line, and k defines 
the point at which the line intercepts or cuts ac^ross the y-axis of the coordi¬ 
nates. The equation may also be expressed as follows for the same data: 


X = my + k 


[9:3] 

Linear equation (x on y) 


The algebraic formula for a linear relationship between two variables can 
thus be stated in two ways: either y as a function of x, or x as a function of y. 
These two equations are called regression equations, and straight lines fitted to 
bi-variate data are called regression lines* 

We shall now illustrate a procedure for fitting straight lines to the bi-variate 
data in Fig. 9:8, and thus describe a method for estimating the degree of corre¬ 
lation between the heights and weights of the 151 infants. 


The Variation of Weight (y) with Respect to Height (x) • * • (y on x) 

Two straight-line equations may be used to express the correlation between 
two variables whose relationship is linear. For the height-weight data in 
Fig. 9:8 these equations are for (1) the variation of weight with respect to 


* Linear equations to express the algebraic relationship between bi-variates were em¬ 
ployed by Galton in his development of a method of correlation for studying the relation¬ 
ship between characteristics of parents and their offspring. He observed, for example, that 
tall parents had, on the average, offspring shorter than themselves and that short parents 
had, on the average, offspring taller than themselves. He described this phenomenon as the 
law of filial regression (the heights of offspring regress toward the parental mean), and 
hence the equations and lines describing the relationship came to be known as regression 
equations and regression lines. 



210 THE PRODUCT-MOMENT METHOD FOR THE CORRELATION OF VARIATES 

height, and (2) the variation of height with respect to weight. We shall first 
consider variation of height with respect to weight. 

It is apparent from Fig. 9:8 that not all the infants with a height of 27 inches 
have the same weight; rather, their weight varies from 17 to 20 pounds (these 
are the mid-point values of each class interval for which there are correlational 
frequencies). Similarly, not all the infants with a height of 32^ inches have 
the same weight; their weight varies from 25 to 28 pounds. These represent 
the variation in weight of the infants at the extremes of the height scale. But 
such variation in weight is characteristic of the infants of a given height at 
any point on the height scale. Thus, the infants with a height of 29^ inches 
vary in weight from 18 to 26 pounds. Furthermore, this variation in weight 
follows an important pattern. Although the shortest infants may be as variable 
in weight as the tallest infants, the actual range in weight characteristic of 
their variation is different. Thus, the heaviest of the shortest infants (those 
with a height of 27 inches) weighed 20 pounds, whereas the lightest of the 
tallest infants (those with a height of 32| inches) weighed 25 pounds. There 
is therefore no overlapping in the weight of the shortest and tallest infants. 
On the other hand, as we go along the height scale from one class interval to 
the next, we see that there is a considerable degree of overlapping in the 
weight of each successive height group. 

If the overlapping of the range in actual weights from one height class 
interval to another were at a maximum, the correlation would be zero, because 
maximum overlapping in the variability of weights would signify not only 
that the shortest infants were just as variable in weight as the tallest infants, 
but that the actual range in their respective weights was the same. In other 
words, if the correlation were zero, the shortest infants would vary in weight, 
say, from 17 to 28 pounds, and the tallest infants would vary in weight from 
17 to 28 pounds. The extent of the variation of y with respect to the successive 
class-interval values of x is thus basic to the meaning and interpretation of a 
measure of correlation. The greater the variation in y, the less the correlation 
between the two variables, or, conversely, the less the variation in values of 
y for given values of x, the more marked the correlation will be. 

If the correlation between x and y were perfect (1.00), all the infants of a 
given height would have exactly the same weight; there would be no variation 
in weight for a given height. This is what is meant by an invariant relationship. 
Just as the standard deviation is used to summarize the degree of variation 
characteristic of a single variable, so it is also used to summarize the degree 
of scatter characteristic of a bi-variate distribution. When the scatter about 
the regression line is zero, the correlation is perfect. The use of a to measure 
the degree of scatter will be described later (cf. the standard error of estimate. 
Chapter 16, Section B). At this point it should be emphasized that a measure 
of the scatter of correlational frequencies about a straight-line function pro¬ 
vides an index of the correlation between the two variables. The greater the 
scatter, the less the correlation; the less the scatter, the greater the correla- 



ESTIMATION OF PRODUCT-MOMENT r 


211 


•O o to o lO 


iOp‘^P‘Op»op 

'dKpvodo6o^O'’oo'-^'— CNCN 
CNCNCNCNCNCNCNCOCOrOOCOrO 


29.0 

28.0 

27.0 

26.0 

*1 25.0 

I 24.0 

.£ 23.0 

|,22.0 

*s 

^ 21.0 
20.0 
19.0 
18.0 
17.0 


tion. Similarly, the greater the scatter, the less accurately values of one variable 
can be predicted from given values of the other; and, conversely, the less the 
scatter, the greater the accuracy with which values of one variable can be 
predicted from given val¬ 
ues of the other. ^*9- Means of the Variations in Weight for Suc- 

The variation of cessive Class-Interval Measurements of Height * 
weights with respect to Height in inches 

the heights shown in 
Fig. 9:8 is summarized 
in Fig. 9:9. The scatter in 
Figs. 9:1, 9:7, and 9:8 
has already revealed that 
the variation of weight 
with respect to height can 
in all likelihood be de¬ 
scribed by a straight-line 
equation. That this is in 
fact the case is shown 
in Fig. 9:9. The values 
plotted represent the 
meam of the variations 
in weight for the mid¬ 
point measures of height 
of successive class inter¬ 
vals. As indicated at the 
bottom of the figure, the 
mean of the variation in 
weight for the infants 
with a height of 27 inches is 18.0 pounds. This mean value was obtained from 
the four correlational frequencies in the first column of Fig. 9:8. An inspection 
of Fig. 9:8 indicates that of the four infants whose height was 27 inches: 














— 














































•> 






























• 













•/ 





y 













• 

















— 

-- 





















^L. 







j/ 





































































_ 



. f-; -O K . CN 


<NCNCNCNCNCNtN(NCN 

Mean Weight For Each Class Interval of Height (columns) 
* From the data of Fig. 9:8. 


2 infants weighed 17 pounds 
1 infant “ 18 “ 

1 “ “ 20 “ 


The sum of the weights of these four infants is 72 pounds, and their mean 
weight is therefore 18.0 pounds. Similarly, the mean of the variation in 
weight of the infants with a height of 27^ inches is 18.7 pounds; the mean 
weight of those with a height of 28 inches is 19.5 pounds, etc. Thus, a change 
in height is accompanied by an average change in weight. 

This is the meaning of statistical correlation, or co-variation. To say that 
there is some degree of co-variation between two variables is to assert that a 
change in one will be accompanied by some degree of change, on the average^ 
in the other. However, it should be emphasized that the implications of 






212 THE PRODUCT-MOMENT METHOD FOR THE CORRELATION OF VARIATES 


correlation also are basically dependent on the variability characteristic of 
the relationship. That is, a low degree of correlation is indicative of a great 
deal of scatter or variability of measures about the regression line, whereas a 
high degree of correlation signifies relatively little scatter. The straight line 
fitted to the scatter indicates the trend in the average change in the values of 
one variable for given values of the other variable. The product-moment corre¬ 
lation coefficient, r, is used as an index that represents all these aspects about 
linear co-variation. 

Inspection of Fig. 9:9, in which all the mean weights are plotted with respet’l 
to the successive height measures of each class interval, reveals that, with one 
exception, all the mean values cluster along a line that can be drawn most 
simply as a straight line. The exception is the weight of those infants with 
a height of 311 inches (twelfth column in the figure). The five infants with a 
height of 31| inches had an average weight of 23.2 pounds. However, since 
only a few cases are involved, and the divergence of this mean from the 
straight line is not very marked, this has no serious effect on the result. 

The straight line drawn to the data in Fig. 9:9 has been fitted by inspection 
rather than by a mathematical method. Although a graph that shows the 
best-fitting regression lines is not ordinarily constructed in connection with 
the computation of r (see Section C), it is our purpose here to fit a straight line 
to a group of bi-variate data, estimate r from the result, and thereby illustrate 
what is basically involved in computing r. 

For the relation of y on aj, the equation of a straight line has already been 

given as ^ a. t 

y = mx “h k 

In the case of product-moment correlation, y represents measurements of the 
ordinate variable expressed as deviations from the mean (y = F — My ); 
m is a constant denoting the slope of the straight line fitted to the means of 
the variation in y taken with respect to successive class interval values of x; 
and k represents the point of intercept on the ordinate axis of the straight 
line. For straight-line functions for bi-variate data, this point of intercept is 
at the intersection of the means of the y and x variables. Since the intersection 
of the means is taken as the origin of the coordinates, k is equal to zero. In 
other words, a straight line fitted to bi-variate data should pass through the 
origin if a linear function satisfactorily describes the relationship. 

The equation of the straight line fitted to bi-variate data thus becomes 
y = mx. In order to use this equation in estimating values of y (weights) 
from given values of x (heights), it is necessary to obtain a value for m that 
is equal to the slope of the fitted straight line. The slope of the straight line 
fitted to the data in Fig. 9:9 is equal to the tangent of the angle made by this 
straight line with the a;-axis. This angle is designated as a, and its tangent is 
equal to the ratio of yi to Xi. For the data in Fig. 9:9, m is equal to a ratio of 
4 0 

approximately or .73. 



ESTIAAATION OF PRODUCT-MOMENT r 


213 


This value, .73, thus represents the slope of the straight line fitted to the 
data in Fig. 9:9. However, the actual value of the tangent of a depends upon 
how the two variables have been scaled in the correlation chart. Were it not 
for m’s dependence upon the method of scaling, the preceding ratio could be 
used directly as a satisfactory estimate of the degree of correlation between 
the two variables. But this ratio is unsatisfactory in its present form because 
a value of correlation obtained by this method is capricious, since it depends 
upon the way in which the y variable and x variable are scaled on the two 
coordinate axes. What is needed, therefore, is a method of scaling each of 
the two variables that will have the same implications for all such problems. 
Fortunately, this problem can be solved by converting the measures of both 
variables to z scores (cf. p. 177). That is, if the original measures for each 
variable are taken as deviations from their respective means and are expressed 
in terms of their respective standard deviations, the bi-variate data will be 
comparable in all problems for which linear correlation is appropriate. Further¬ 
more, the slope of the best-fitting straight line for such transformed bi-variate 
data will have unambiguous implications; the value of the slope of the best¬ 
fitting straight line is thus the product-moment correlation coefficient, r. 


The z Score Correlation Chart 


In order to estimate r from the actual data in a correlation chart, the 
original values of each variable must be converted into z scores. We have seen 
that a z score is a deviation (z) taken in terms of the standard deviation of its 
distribution. Thus: 


And similarly, 


Zx = 


A - Mx 


(Tx 


Zu = 


Y-M. 


There are two ways in which the height-weight data in Fig. 9:8 can be con¬ 
verted into z scores. Either each weight and height measure in Table 9:1 can 
be converted into Zx and Zj,, or a correlation chart can be constructed in which 
the limits of the intervals on the height and weight scales are set up both in 
terms of z and in terms of corresponding values of the original measures. 
When many cases are to be cross-tabulated, the first method is cumbersome 
and time-consuming. Consequently, the latter method will be used here; it is 
illustrated by the correlation chart shown in Fig. 9:10. 

Conversion of Original Score Limits to z Score Limits 

In this correlation chart, the scales of the x and y variables have been made 
comparable in terms of their respective standard deviations. In other words, 
the scales of both variables have been converted into z scores. For illustrative 
purposes, the range of each class interval has been taken as equal to a z score 



214 THE PRODUCT-MOMENT METHOD FOR THE CORRELATION OF VARIATES 


value of .50, or half a standard deviation unit. The z score limits of each 
class interval for the x variable (height) are scaled at the bottom of the chart. 
The corresponding original height score limits (in inches) are scaled at the 

top of the chart. Simi- 

Fig. 9:10. A z Score Correlation Chart for the Estima- larly, the weight variable 
tion of Product-moment r * is scaled in z score units 


- -1.25 ^ 

- -1.75 .3 

0 

-2.25 ^ 


Fig. 9:10. A z Score Correlation Chart for the Estima- larly, the weight variable 
tion of Product-moment r* is scaled in z score units 

Height In Inches at the right of the chart 

the ongmal weights 

1 I 3-25 ^ (in pounds) are scaled at 

28.6 - y- 2.75^ JJy ^J^ig method, 

27.4 - - j- 2.25^ the original height and 

26.1 -F"r~2— 3 —"/—/—r weight measures in 

^ 24.9-L25.a Table 9:1 can be cross- 

g 23.6 -.75 I tabulated in the correla- 

^ 22.4 ” = ==^“ = = = .25- tion chart and at the 

f 21.2 - -- - - - - --.25 J same time converted to 

.' 1^9 9 - i — L.JL— — J. - —i - 75 ^ z score values without 

^ _ i __ 1.25 ai^y additional computa- 

yy ^ _ LJ. _ t-J-1 _ 1.75 J lions. The original data 

2 2 are thus fed into an 

16.2 -2.25 o , , , , , 

_2 75 *^ original score correlation 

I I T I I I I I I I I 1 3 25 chart, and they come out 
aRSicasasRasa'i? a as 2 scores. 

?7?77‘ ‘ computations re- 

z Score Um t. of Cla.. Intervok for Heiaht 

!>! -A. iL ilML 13-11 —L A A A transforming the correla- 

£zy ’6.0 ’7S ’21,0-11.012.0 SO 12.0 SO 2.S 2.5 6.0 .. , « . « . 

Mz.-tional frequencies of orig- 

M.an Z Scor« of y (w.iflht) for cio» “al scores into z scores 

intarvais of x (Height) consist in (1) Computing 

* Height-Weight Data of Table 9:1. the means and standard 

deviations of each vari¬ 
able, and (2) determining original score limits that correspond to the desired 
z score limits. On computation, the means and standard deviations of the 
height-weight variables are as follows: 

mean weight = 21.8 
mean height == 29.4 

<r weight = 2.48 
<r height = 1.14 

It will be recalled that the z score value of the mean of a distribution is equal 
to zero, since a mean value does riot deviate from itself. Thus, if M* = 29.4, 


»n«o ‘o»o‘o * 0 * 0*0 *o *o*o*o,*o*o 

<NfS CMKCM KCSCN K<MKM KC^ 

r)<Nc4*7»7,',* * 

I f I I I ' ' 

z Score Limits of Class Intervals for Height 


2% 32 

2Z /</ n 

7 

5 2 

r -21.0 -no 

2.0 SO 12.0 

5.0 

2.5 25 

S -75 -M 

.07 .26 .71 

.7/ 

..50 t.25 

Scores of 

F y (Weight] 

1 for Class 


Intervals of x (Height) 

' Height-Weight Data of Table 9:1. 


29.4 - 29.4 ^ 

1.14 ® 




215 


ESTIMATION OF PRODUCT-MOMENT r 


And since the z score value of any height score is equal to 


_ _ X - 29.4 

<Tx 1.14 

any value of X in terms of z will be equal to 

X = Zx(Tx + Mx 


Thus, X = 2:*(1.14) + 29.4 

And the X value of 2 = 0 (the mean z score) will be: 


[9:4] 

To determine the value 
of an original score 
from a z score 


X = 0(1.14) + 29.4 = 29.4 {X, when z = zero) 

The value of X can thus be obtained for any z score value by means of 
Formula 9:4, the conversion formula. For Fig. 9:10, the successive X values of 
the limits of the successive class intervals are needed. If these are obtained by 
starting at the mean of the distribution, the height measurement correspond¬ 
ing with a z score value of .25 will be equal to 


X = .25(1.14) + 29.4 = 29.68 (X, when z = .25) 

This is the limit indicated at the top of the correlation chart in this figure 
for a point in the scale of height scores corresponding to a z score of .25 (at 
the bottom of the chart). Similarly, the height measurements corresponding 
to z score values of .75 and 1.25 are: 

X = .75(1.14) + 29.4 = 30.26 {X, when z = .75) 

X = 1.25(1.14) 4- 29.4 = 30.82 (A, when z = 1.25) 

In this fashion the height measurements corresponding to the z score limits of 
each class interval in the height scale can be determined. However, an alterna¬ 
tive and simpler procedure is as follows: 


1. Determine the height measurements for Zx = .25 and z* = —.25. 

2. Determine the range in height units (inches) of a class interval equal to a 
range of .50 z score units (one-half a standard deviation). 

3. Add the range value in inches obtained in the preceding step to the X value 
of z = .25, to find the value of z = .75 in inches. Continue adding in this 
fashion to determine the value in inches of each successive class-interval 
limit above the mean height. 

4. Subtract the range value in inches of a class interval in height (obtained in 
the second step) from the X value of z = — .25, to find the value of z = — .75 
in inches. Similarly, subtract the same range value in inches from the 
height value of z = —.75, to obtain the value of z = —1.25 in inches. 
Continue subtracting in this fashion to determine the value in inches of 
each successive class-interval limit below the mean height. 


This procedure can be summarized as follows: 

Where the z interval equals .50, the range in inches of any z score interval 
for the height variable is equal to the difference between the height values 



216 THE PRODUa-MOMENT METHOD FOR THE CORRELATION OF VARIATES 

of z .25 and z « —.25. The value of X when z = .25 has already been 
Found to be 29.68. The value of X when z = — .25 is as follows: 

A' = - .25(1.14) + 29.4 = 29.12 

Therefore, the difference between 29.68 and 29.12 gives the range in inches of 
any class interval whose range is equal to one-half a z score unit. This differ¬ 
ence, .56 inch, can now be used as a constant amount to be added to and 
subtracted from the height measures corresponding to z scores of .25 and 
“ .25. Thus, the values in inches of the successive class-interval limits above 
the mean are as follows: 

1. Range value, in inches, of class intervals e([ual to .50 z score units is .56. 

2. When z = .25, X = 29.68 inches 

3. When z = .75, X = 29.68 -i- .56 = 30.24 inches 

4. When z = 1.25, X = 30.24 + .56 = 30.80 “ 

5. When z = 1.75, X = 30.80 -|- .56 = 31.36 “ 

6. When z = 2.25, X = 31.36 + .56 == 31.92 “ 

7. When z - 2.75, X - 31.92 + .56 = 32.48 “ 

8. When z = 3.25, X = 32.48 -h .56 = 33.04 “ 

Ordinarily, it is unnecessary to carry the limits beyond z = + 3.25 because in 
most distributions values beyond this point do not occur. In tlie height data 
in Fig. 9:8 the maximum height is 32.50 inches and hence is included in the 
class interval whose upper limit is equal to a z score of 3.25. 

The height values of the limits of the class intervals below the mean of the 
distribution of height scores are found by the same method, except that the 
constant value of .56 inch is subtracted from the height value of the suc¬ 
cessive limits. Thus, 

9. When z = —.25, X = 29.12 inches 

10. When z = -.75, X = 29.12 - .56 = 28.56 inches 

11. When z = - 1.25, X = 28.56 - .56 = 28.00 “ 

12. When z = - 1.75, X = 28.00 - .56 = 27.44 “ 

13. When z = - 2.25, X = 27.44 - .56 = 26.88 “ 

14. When z = - 2.75, X = 26.88 - .56 = 26.32 “ 

15. When z = - 3.25, X = 26.32 - .56 = 25.76 “ 

A similar procedure is used to convert the ordinate scale of weights into 
intervals whose limits will correspond to the intervals on the z score scale 
at the right of the chart in Fig. 9:10. Any value of Y in terms of z is equal to: 

Y = Zy<Ty H- My |[9 1 4a] 

Since My == 21.8 lbs., and <7y = 2.48 lbs., 

Y = Zy(2.48) + 21.8 

The original score values of the class-interval limits are indicated at the left 
of the chart. 

With both scales of original measures converted into z score intervals, the 
final step in making a z score correlation chart consists in cross-tabulating the 







ESTIAAATION OF PRODUCT-MOMENT r 


217 


origina] data (Table 9:1) into the chart shown in Fig. 9:10. The scales at the 
left and top of this chart are the reference scales for the tally. This figure 
shows the final result for the infants’ heights and weights in terms of the 
number of correlational frequencies per cell, rather than the tally itself. The 
z score interval for any case can now be readily determined by referring 
to the scales at the right and bottom of the chart. 

The Regression Line for Zy on Zx 

Once the correlation frequencies of original data are reorganized into a 
z score correlation chart like that in Fig. 9:10, a regression line can be fitted 
to the data and from this line the product-moment correlation coefficient can 
be directly estimated. The procedure for estimating r consists in computing 
the average values of the z scores of one variable that are associated with the 
successive class-interval values of the other variable.* Such computations for 
the original measures were made in Fig. 9:9 for the variation of the original 
weight scores of the y variable with respect to the original height scores of 
the X variable. Therefore, the same relationship of y with respect to x will 
now be developed in terms of z scores. 

The z score limits of each of the 13 class intervals of variables x and y have 
been selected in Fig. 9:10 so as to yield mid-point values that are convenient 
for arithmetical manipulation. Thus, the mid-point values of each class 
interval, beginning with the lower end of each scale, are as follows: 

-3.0, -2.5, -2.0, -1.5, -1.0, -.50, 0, .50, 1.0, 1.5, 2.0, 2.5, 3.0 

These, then, are the z score values to be used in working with the correlation 
frequencies of the z score correlation chart. Fig. 9:10 shows that there are 
four infants whose height measurements are within the interval whose mid¬ 
point value is —2.0, in other words, whose mid-point value is 2.0 standard 
deviation units below the mean of the height distribution. The weight of these 
four infants varies, when converted to z scores, from —0.5 to —2.0. The 
average of the z score variation of their weight is therefore computed as 
follows: 

2 infants with weight equal to z scores of —2.0 = —4.0 
1 infant with weight equal to z score of —1.5 = —1.5 
1 infant with weight equal to z score of —0.5 = —0.5 

S = -6.0 
Mean = —1.50 

The sum of the z score weights of these four infants is thus —6.0 and the 
average of these four variations is a z score of —1.50. Similarly, the mean of 
the z score variation in the weight of the infants with z scores of —1.5 for 


That these are average values is indicated by the use of the bar above the symbol 
for the dependent variable. 



218 THE PRODUCT-MOMBm METHOD FOR THE CORRELATION OF VARIATES 


the height variable is found to be —1.25; the mean z score weight of those 
with a z score height of —1.0 is —.75. These mean values, as well as those 

for the remaining inter- 
Fig. 9:11. Estimation of Product-Moment r vals of height, are indi- 

from o Straight-Line Function for 2y on zx * cated at the bottom of 

* Scores for Height the correlation chart in 

Fig. 9:11. 


O o I/H O 

r> f>i oi ^ -A 


O O O 

^ ; CM CM* ro 


23 Estimating r 

2.0 The matrix in Fig. 9:11 

13 is the same as the z score 

-correlation chart in 

• '--9:10. The means of 

w = = ^^=^ = = 4= the variation in z score 

^ ^weights for the successive 

|“.5___~^_~_ class intervals of height 

^ have been plotted and 

5 a straight line has been 

^ fitted, by inspection, to 

- these means. The slope 

- of this straight-line func- 

*^•^ 1 I II I 1 I tion is again (as in 

^ ^ ^ ^ Fig. 9:9) equal to the 

Mean ^ JQ k cS o, S? cJ q tangent of the angle a 

* * J ♦ f IP' mn made by this best-fitting 

•FromDataof Fig. 9:10. . / ® 

straight line with the 

abscissa. Since the scales of the variables x and y have been made comparable 
in terms of z scores, we shall now symbolize the slope of the straight line 
by r. We find that r is equal to approximately the following ratio: 

r 2.0 

rvx — — = = .67 


This value, .67, is the slope of the straight line fitted to the bi-variate data in 
Fig. 9:11. It is (he estimated value o/r, the product-moment correlation coefficient. 


The Regression Equation of Zy on Zx 

The product-moment correlation coefficient, r, is thus seen to be the slope 
of a straight-line function fitted to bi-variate data that have been convert^ 
into comparable deviate scales, each of which is taken in terms of the standard 
deviations of each variable. The regression equation by which the z score 
values of the y variable can be estimated from the z score values of the 
X variable, is as follows: . 3 ^ 

» rys2« Regression equation of 

Sy on Im 





ESTIAAATION OF PRODUa-MOMENT r 219 


For the height-weight relationship in Fig. 9:11, this linear equation is equal 

Sy= .67z, 

In order to estimate actual weight scores from given height scores, it is neces¬ 
sary to express the preceding equation in terms of the original height and 
weight score values. If X symbolizes €iny original height measure and Y any 
original weight measure, we have 


Y- My _ A' - M* 

- Tyx 


[9:5a] 


Rearranging the terms of this linear equation in order to solve for Y from any 
given value of X, we have 


y" _ ^yxf^yjX Mx) 
fJx 


[9:5b] 


or, as usually expressed for convenience in computing, 


[9:6] 

Regression equation of 
F on A (original score 
form) 


Substituting the values already obtained for the means and standard devia¬ 
tions of each variable, and using the value of r estimated from the regression 
line in Fig. 9:11, we have: 

^ = -67 ^ (A - 29.4) + 21.8 
1.14 

= 1.46(A - 29.4) + 21.8 
= 1.46A - 21.12 


The regression equation for the data in Fig. 9:11, from which the infants’ 
weights can be estimated from any given height value, is thus 

Y = 1.46A - 21.12 

where X is the height in inches and Y is the average estimate of weight in 
pounds. Infants with a height of 29 inches (the mid-point height value for 
the interval ranging from 28.75 to 29.25 inches) would, on the average, have 
the following weight in pounds: 

Y = 1.46(29.0) - 21.12 = 21.2 pounds 

Thus, on the basis of the estimate of correlation obtained from the straight 
line fitted to the height-weight data in Fig. 9:10, we would expect infants 
whose height was 29 inches to have an average weight of 21.2 pounds. 


The Regression Equation in Descriptive Statistics''' 

From the point of view of descriptive statistics the preceding predictive 
estimate is unnecessary because the average weight of infants whose height 
was between 28.75 and 29.25 inches can be determined directly from the 


* For its use in sampling and analytical statistics, see chap. 16. 



220 THE PRODUCT-MOMENT METHOD FOR THE CORRELATION OF VARIATES 


original data. The basic value of the method of correlation for descriptive 
statistics is not the regression equation per se, from which values of one 
variable can be estimated from those of another, but rather the correlation 
coefficient as an index to summarize the degree of correlation between bi¬ 
variates. 

The purpose of introducing at this time the regression equation for product- 
moment r has been to emphasize the following points: 

1. The fact that product-moment correlation is based on the assumption that 
a straight-line function will adequately describe the relationship between 
two variables. 

2. The fact that r, the product-moment correlation coefficient, can be esti¬ 
mated by means of a graphic method. 

3. The fact that r is in reality the slope of a straight line fitted to a bi-variate 
distribution set up in terms of z scores. 

4. The fact that there are two regression equations for each bi-variate rela¬ 
tionship. 

The Regression of Zx on Zy 

We shall now describe the basis for the second straight-line function and 
the corresponding regression equation for the height-weight data in Fig. 9:8. 
Having considered the development of the relation of Zy to z* (the way in 
which weights vary with respect to successive class-interval height measure¬ 
ments), we shall discuss 
Fig. 9:12. Estimation of Product-Moment r the relationship of Zx to 

from a Straight-Line Function for z* on zy * Zy —in other words, how 

2 Scores for Height the height of the infants 

77^77'* vanes with respect to 

Mean their weight measure- 

^ ments. The necessary 

^ ^ ^ correlational data for this 

^ relationship are already 

cross-tabulated in the z 
score correlation chart 
(Fig. 9:10). We shall use 
this z score matrix again 
to determine the means 
" of the variations in height 
with respect to different 
-1.75 weights. The results are 

presented in Fig. 9:12. 
The correlational fre- 
^*x quencies in Fig. 9:10 in- 

• Prom Data of Fig. 9:10. dicate that there were four 





ESTIMATION OF PRODUCT-MOMENT r 


221 


infants whose weight was in the z score interval ranging from —2.25 to —1.75. 
Two of these infants had z scores for height that also ranged from —2.25 to 

— 1.75. The other two, however, had z scores for height ranging from —1.75 
to —1.25. The mean z score for the height of these four infants is therefore 

— 1.75. This mean is noted opposite the bottom row at the right of Fig. 9:12. 
Similarly, the next class interval of the weight z scores ( — 1.75 to —1.25) 
in Fig. 9:10 shows that there were 15 infants in this weight group and their 
z scores for height varied from —2.00 (mid-point value) to zero. The mean 
of their z scores, —.80, is likewise noted at the right of Fig. 9:12. The mean 
z scores for height for the remaining weight groups are also indicated at the 
right of this chart. Each mean is then plotted in the matrix in the chart, and 
a straight line is drawn by inspection to these means, giving the regression 
line of Zx on Zy, From the slope of this line, a second estimate of the correla¬ 
tion between z and y can be made. The coefficient, r, is equal to the tangent 
of the angle made by the regression line and the ordinate axis (a in Fig. 9:12). 
The tangent is therefore equal to the following ratio: 


r xy 


^ _ 1.625 
Zy^ 2.5 


The estimated value of the product-moment correlation coefficient from the 
relation of tlie x variable to the y variable is thus .65. 


The Regression Equation for Zx on Zy 

The straight-line equation for the relationship of z, to Zy is as follows: 

[9:7] 

Zx — rxyZy — .65Zy Regression equation of 

ix cn Zy 

This is the regression equation of z on y in terms of z scores. In order to 
convert it into a form from which values of X (height in inches) can be 
estimated from given values of Y (weight in pounds), we proceed as before; 
namely, we express the z score values in original scores as follows: 


X-Mx 


(Tx 



[9:7a] 


From this we convert the expression for the solution of X for any value of Y: 


<Ty 


+ Mx 


or, as usually expressed for convenience in computing. 


X =^rxy-(Y- My) -h Mx 

(Ty 


[9:7b] 


[9:8] 

Regression equation of 
JC on Y (original score 
form) 



222 THE PRODUCT-MOMENT METHOD FOR THE CORRELATION OF VARIATES 


Substituting in this regression equation the values obtained for the means 
and standard deviations of the height and weight variables and the value 
of r estimated from the straight line in Fig. 9:12, we find the equation of X 
on Y to be equal to: 

^ = .65^ (y-21.8)+29.4 

X = 0.30(Y - 21.8) + 29.4 
= .30 Y + 22.86 

This is therefore the regression equation for veu^iations in the infants’ height 
as associated with different values of weight. We would expect infants with 
a weight of, say, 26 pounds to have an average height of 30.7 inches: 

X = .30(26.0) + 22.86 = 30.7 inches 


The Regression Coefficients 

From the preceding, we have seen that there are two regression equations 
that describe the co-relationship of two variables. One equation describes the 
variations of the y variable with respect to the x variable, and the other 
describes the variations of the x variable with respect to the y variable. 
Expressed in z score form, these equations were found to be as follows: 


Zy — ryxZx 


Zx — rxyZy 


[9:5] 

Regression equation of 
Zy on Zx 

[9:7] 

Regression equation of 
on Zy 


In these equations, each measurement of the two variables is expressed in 
terms of its distance from the mean of its respective distribution, taken in 
units of the standard deviation of the distribution. The correlation coeffi¬ 
cients, Pyx and Txy, of these two equations are identical with the regression 
coefficient and with each other. 

Regression equations are, however, written more usually in terms of x 
and y deviation measures, or in terms of original measures. We saw that the 
respective regression equations of Y on and A on Y were obtained by 
expressing the z score equation in terms of original measures: 


Y = r^^iX - M^)+My 
CTx 

X = r^-(Y-My)+M, 

<Ty 


[9:6] 

Regression equation of 
YonX 

[9:8] 

Regression equation of 
Xon Y 


From these equations the average values of Y associated with X, or of X 
associated with Y, can be estimated. It should be observed, however, that 



ESTIAAATION OF PRODUCT-MOMENT r 


223 


the regression coefficients in Formulas 9:6 and 9:8 are not equal in value 
to the correlation coefficient itself; rather, the regression coefficient of the 

first equation is ryx and the regression coefficient of the second is r^y 

These regression coefficients are usually symbolized by 6, as follows: 


0-* 


hxjj — /zj 


ay 


[9:9] 

Regression coefficient 
of y on a? 

[9:10] 
Regression coefficient 
of £ on y 


We have seen why m alone is not a suitable estimate of the slope of the 
best-fitting straight line for bi-variate data which are not reduced to com¬ 
parable deviation scales (in terms of their respective standard deviations). 
We now see that an appropriate mathematical adjustment can be made in 
determining the regression coefficients by means of the ratio of the standard 
deviations of the respective distributions. Only if this adjustment has already 
been made by converting original measures into z scores will r, the correlation 
coefficient, also be equal to the regression coefficient. WTien this conversion 
has been made, the regression coefficient is symbolized by the Greek letter 
beta: 


Pyx 1*yx 


Pxy — 1*xy 


[9:11] 

Regression coefficient 
of Zy on Zx 

[9:12] 

Regression coefficient 
of Zx on Zy 


Regression Equations Expressed in Terms of x and / 


Regression equations are often expressed in terms of the deviations x and y, 
where x = X — M* and y = Y — My. Expressing the regression equations 
in Formulas 9:5 and 9:7 in terms of x and y gives the following, inasmuch as 

V X 

Zy = —, and Zx = —: 

Oy <Jx 


or, for the solution of y: 


y X 


re ^ ir 

(Ty <Tx 



ay 

y — ^ yxay 

^ ryx ~x 

ax 

ax 


[9:13] 
Regression equation of 
^ on a; 


This, then, is the regression equation of y on x, the regression coefficient being 
the same as in the equation in Formula 9:6. 



224 THE PRODUCT-MOMENT METHOD FOR THE CORRELATION OF VARIATES 


And 


or, for the solution of x: 


X 


(Tx 


1*xy 


y. 

ay 


X = r^yax 


y. 

ay 



[9:14] 

Regression equation of 
£ on y 


Again the regression coefBcient in the above formula is the same as that in 
Formula 9:8. 


Standard Formula for r 


The last set of regression equations, Formulas 9:13 and 9:14, is expressed 
in terms of the deviations x and y. The standard formulation of the product- 
moment correlation coefficient is also expressed in these same terms, as 
indicated earlier in this chapter. 


S(a;y) 

N _ X(xy) 

axay Naxay 


[9:1] 
Product-moment r 


where 'S(xy) is the algebraic sum of the products of the deviation values, 
X and y, obtained for each pair of associated measures, N is the number of 
paired frequencies, and a^ and ay are the standard deviations of the variables 
correlated. The correlation coefficient, Txy, is thus formulated as the ratio 


of the mean of the product deviations, 



to the product of the stand¬ 


ard deviations of the two variables. 

In order for the preceding formula to be used in computing the product- 
moment correlation coefficient, each original pair of measures being corre¬ 
lated must be converted into x and y deviations, either at the beginning or 
toward the end of the computations. We shall see in the next section that in 
the short method for computing r this conversion is made toward the end. 
That is, instead of each original measure being converted into its respective 
x and y value, the initial computations are made from the original measures 
themselves and the value of the numerator, ^{xy), is obtained as one of the 
final steps in the computation (see Fig. 9:13). 

The correlation coefficient can also be computed from the z score measures 
of a bi-variate distribution. In this case, r is the mean of the algebraic sum of 
the products of each associated pair of z scores: 




[9:15] 

r from z scores 


These two formulas for r symbolize what is basically involved in comput¬ 
ing the product-moment correlation coefficient. They represent two processes 
that can be used to obtain the slope of the straight line which best fits the 
bi-variate data. Any method for computing r is essentially a mathematical 



COMPUTATION OF PRODUCT-MOMENT r 225 

procedure for obtaining the best-fitting straight line by the method of least 
squares, a method which yields a mathematical result such that the errors of 
fit are at a minimum. Consequently, from the slope of such a straight line a 
precise value for r*y can be obtained. 

In estimating the value of r from regression lines fitted to bi-variate data, we 
have seen that a determination can be made from the regression either of y 
on X, or of x on y. Inasmuch as these lines are fitted by inspection, they can¬ 
not be expected to yield exactly the same values for r. However, if they are 
fitted carefully, the difference between the two values of r should not exceed 
.05. On the other hand, in computing the value of r, only one coefficient is 
obtained. It can be regarded as the correlation either between x and y, or 
between y and x. When the errors of fitting regression lines to bi-variate data 
are reduced to a minimum, as is the case when r is computed, the slopes of 
both lines are identical, and therefore r^x = r^. In practice, the subscripts for 
r are usually written as xy, the notation yx being discarded. 

C. COMPUTATION OF PRODUCT-MOMENT r 

Summary of Mathematical Implications of r 

In the preceding section we described a method for estimating Pearson’s 
product-moment correlation coefficient from the cross-tabulated data of two 
variables. We saw that the estimating process is not particularly difficult, 
once its implications are clear. Before proceeding with methods for computing 
r, it will be relevant to review the meaning of the correlation coefficient from 
a mathematical point of view. 

When the data of each variable have been converted to comparable z score 
scales, r is the slope of two regression lines (linear functions) fitted to the 
variations of y with respect to x and to the variations of x with respect to y. 
When the correlation coefficient is estimated from the cross-tabulated data of 
a z score correlation chart, two values for the correlation coefficient are 
obtained, one from the slope of the regression of Zy on Zx and the other from 
the slope of the regression of Zx on Zy, These two values need not be identical, 
because the fitting of regression lines by inspection cannot yield a precise 
result. On the other hand, when the product-moment correlation coefficient 
is computed, only one coefficient is obtained. It is a mathematical average of 
the respective slopes of Zy on Zx and on Zy. Furthermore, in effect, it is 
computed by the mathematical method of least squares. This means that the 
straight-line functions are fitted (algebraically, not graphically) to the bi¬ 
variate data in such a way as to reduce the errors of fit to a minimum.* This 


* For the mathematical nature of the method of least squares, see M. Phillip, The Prin¬ 
ciples of Finance and Statistical MaihemaiicSt Prentice-Hall, New York, rev. ed., 1941, 
chap. 14; and C. C. Peters and W. R. Van Voorhis, Statistical Procedures and Their Mathe¬ 
matical Basis, McGraw-Hill, New York, 1940, p. 299. 



226 THE PRODUCT-MOMENT METHOD FOR THE CORRELATION OF VARIATES 


is why the computed value of r provides a more accurate index of correlation 
than does the estimate from a straight-line function fitted by inspection. 

Various Methods for the Computation of r 

The computation of r involves only two steps in addition to those described 
in Chapter 7 for computing the mean and standard deviation of a variable. 
The first of these two steps has already been described in the present chapter— 
the now familiar cross-tabulation, in a correlation chart, of the data of the 
two variables to be correlated. However, this step is often carried out only 
algebraically and not by means of a correlation chart. (Cf. Table 9:2.) The 
second step consists in computing the products of the deviations for each pair 
of associated measures in the bivariate distribution, and then summing and 
averaging these products. 

We have seen (Formula 9:1) that the product-moment correlation coeffi¬ 
cient is algebraically equal to the ratio of the mean of the product deviations 
to the product of the standard deviations of the two variables being corre¬ 
lated. Many methods of computation have been devised for obtaining this 
ratio. In this chapter we shall describe the following methods, which are 
among the most commonly used procedures: 

I. Product-moment r computed from ungroupcd data (long method). 

II. Product-moment r computed from grouped data (short method). 

III. Product-moment r computed from ungrouped data (machine 
method). 

The relative advantages and disadvantages of these three methods will be 
apparent as we proceed. 

D. METHOD I; PRODUCT-MOMENT r FROM UNGROUPED 
DATA (LONG METHOD) 

In the long method for ungrouped data, the mean and standard deviation 
of each variable are computed and the original measures are each expressed as 
deviations (x and y) from their respective means. The products of the devia¬ 
tion values of each pair of measures are then obtained, and all the products 
for the bi-variate distribution are summed to obtain S(xy). With these com¬ 
putations made, the correlation coefficient is readily obtained, being equal 
to the ratio of the mean of the product deviations to the product of the 
standard deviations of the two distributions. Thus, 

S(xy) [9:1] 

N Z(xy) Pearson’s product-mo- 

r*y -- ■ ment correlation coeffi- 

OTxO’y jyO’xfTjf dent 

If the <r*8 in this formula are expressed in deviation terms, the computations 
can be simplified by canceling out the iV’s: 



PRODUa-MOMENT r FROM UNGROUPED DATA (LONG METHOD) 227 


2(xy) _ ^ 2(xy) 

/2y» Vsx^y/Ty 

SwSn- 


[9:16] 

Pearson r (alternate 
form for computation) 


where represents the sum of the squared deviations of the x variable, and 
represents the sum of the squared deviations of the y variable. 


Order of Operations for Method I 

Method I is illustrated in Table 9:2. For purposes of simplification, the data 
of only 20 cases have been used. The paired associates in this table represent 
the scores made by 20 persons on two separate administrations of a digit- 
span test.* The scores made on the first administration are given as the x 
variable and represent the average result of four trials. The average scores 
received by each person on the second administration of the test are given 
as the y variable, and likewise represent the average of four trials. 

The order of computation in Table 9:2 is as follows. (The similarity to the 
procedure already developed for computing the mean and standard deviation 
from ungrouped data will be apparent.) 

1. The data of each of the variables are arranged by associated pairs in two 
adjacent columns (columns 2 and 3 of the table). In the case of these 
data, the basis for each associated pair is the subject taking both adminis¬ 
trations of the test. 

2. The mean of each variable is next obtained by summing the scores of each 
variable and dividing by N, the number of measures. 

3. The deviations of the measures of each variable from their respective 
means are obtained and entered in adjacent columns (columns 4 and 5), 
to yield the deviations x and j, where x = X — A/x, and y = Y — My. 
Care is necessary in differentiating the positive and negative deviations. 

4. The deviations of each variable arc squared (columns 6 and 7) to give 
and y2 for computing the standard deviations. 

5. The products of each associated pair of x and y deviations are obtained 
to give the product deviations (column 8). The signs of the deviations must 
be carefully noted in computing these products so that the correct sign 
will be entered in the column. 

6. The product deviations of the last column are summed algebraically and 
then averaged (divided by N) to give the mean of the product deviations. 
This is the necessary value for the numerator of r in Formula 9:12. The 
ratio of this value to the product of the standard deviations of each vari¬ 
able gives the correlation coefficient, r. 

* These data are from J. G. Peatman and N. M. Locke, “ Studies in the Methodology of 
the Digit-Span Test,” Archives of Psychology^ Monograph No. 167, 1934. The correlation 
coefficient obtained here between the digit-sp6Ui scores of two separate administrations of 
the test constitutes an index of the reliability of the test, by the method of test-retest. (Cf. 
chap. 17. Section B.) 



228 THE PRODUCT-MOMENT METHOD FOR THE CORRELATION OF VARIATES 


Table 9:2. Computation of the Product-Moment Correlation Coefficient frj— 
Long Method with Ungrouped Data . 

(Correlation of Digit-Span Test Scores for 20 College Students: 

Test and Retest) * 


(1) 

(2) (3) 

Digit-Span Scores 

(4) 

Deviations 
from Mean 
of X 

X 

(5) 

Deviations 
from Mean 
of y 

y 

(6) (7) 

Deviations Squared 

(8) 

Product 

Subfects 

1st Test 

X 

Retest 

Y 



Deviations 

B 

B 

xy 

A 

8.5 

8.25 

■E9 

.60 

1.1025 

.3600 

.6300 

B 

6.75 

8.25 


.60 

.4900 

.3600 

-.4200 

C 

6.25 

7.75 


.10 

1.4400 

.0100 

-.1200 

D 

6.0 

5.5 


-2.15 

2.1025 

4.6225 

3.1175 

E 

7.0 

7.25 

-.45 

-.40 

.2025 

.1600 

.1800 

F 

9.25 

9.25 

1.80 

1.60 

3.2400 

2.5600 

2.8800 

G 

5.5 

5.75 

-1.95 

-1.90 

3.8025 

3.6100 

3.7050 

H 

9.25 

8.25 

1.80 

.60 

3.2400 

.3600 

1.0800 

1 

7.75 

6,75 

.30 

-.90 • 

.0900 

.8100 

-.2700 

J 

5.0 

5.25 

-2.45 

-2.40 

6.0025 

5.7600 

5.8800 

K 

6.5 

7.25 

-.95 

-.40 

.9025 

.1600 

.3800 

L 

7.75 

8.5 

.30 

.85 

.0900 

.7225 

.2550 

M 

9.0 

8.0 

1.55 

.35 

2.4025 

.1225 

.5425 

N 

7.5 

8.25 

.05 

.60 

.0025 

.3600 

.0300 

O 

7.0 

7.75 

-.45 

.10 

.2025 

.0100 

-.0450 

P 

7.75 

8.25 

.30 

.60 

.0900 

.3600 

.1800 

Q 

5.75 

5.75 

-1.70 

-1.90 

2.8900 

3.6100 

3.2300 

R 

8.75 

9.5 

1.30 

1.85 

1.6900 

3.4225 

2.4050 

S 

8.75 

8.50 

1.30 

.85 

1.6900 

.7225 

1.1050 

T 

9.0 

9,00 

1.55 

1.35 

2.4025 

1.8225 

2.0925 

N == 20 

149.0 

153.0 

0 

0 

34.0750 

29.9250 

27.6925 


tsx) 

(ST) 

(Check sum) 

(Check sum) 

(Sx*) 

(V) 

nnm 


M.an, = ^4?^^ = 7.45 


20 


Mean, = = 7.65 

20 


34.0750 

20 


= 1.305 


.-4 


'2 9.9250 

20 


= 1.223 


26 8375 

Mean of Product Deviations — ——— — 1.3419 

20 

Correlation ^ 1.3419 ^ 1.3419 

Coefficient (1.305)0.223) 1.5960 


* These 20 cases wore drawn randomly from the original group of 142 subjects. The coefficient 
for the total group was .86, as compared with .84 for these 20 cases. 

In the example in Table 9:2, the mean of the product deviations is 1.342; 
the product of the standard deviations of the two variables is 1.596; the ratio 
of 1.342 to 1.596 is .84; and r is therefore .84. There is consequently a marked 
positive relationship in digit-span performance on the two administrations 
of the test for these 20 subjects. 



































PRODUCT-MOMENT r FROM GROUPED DATA (SHORT METHOD) 229 


Shortcomings of Method I 

This method of computing r has two shortcomings. (1) The amount of 
arithmetical work is unnecessarily laborious, as was the case in computing 
the standard deviation by the long method. (2) Of real importance in the case 
of unfamiliar data, the form of the bi-variate distribution cannot readily be 
seen unless a scattergram or correlation chart of the data is made. This is 
likely to prove a serious handicap at times, inasmuch as the basic assumption 
in the method of product-moment correlation is that the relationship between 
two variables is linear. Unless the data are cross-tabulated, the investigator 
cannot be sure that a straight-line function will be appropriate for the bi¬ 
variate data. Hence, the long method for ungrouped data should be avoided 
except in the case of familiar variables when one can be confident that a linear 
function will satisfactorily describe the relationship. 


E. METHOD II: PRODUCT-MOMENT r FROM GROUPED 
DATA (SHORT METHOD) 

Although a long method can be used for computing r from grouped data 
cross-tabulated in a correlation chart, it is so unnecessarily laborious that we 
shall omit it here and describe only the short method. The difference between 
the two is analogous to the difference in the two methods for computing the 
standard deviation described in Tables 7:8 and 7:10. Most of the steps in the 
short method have already been developed in computing the mean and the 
standard deviation (Table 7:10) ; in fact, aside from making an original cross- 
tabulation of the bi-variate data, only two additional steps are involved. 
They are described in Fig. 9:13, which illustrates Method II, and they in¬ 
volve the computation of Sy', and The correlation chart in Fig. 9:13 
will be recognized as the same as that in Fig. 9:8, the cross-tabulation of the 
weights and heights of 151 girl infants. 

The formula for the computation of r by Method II is as follows: 



[9:17] 
PeHrstm’s r by short 
method with devia¬ 
tions from guessed 
means 


This formula may also be written with the symbol c representing the opera¬ 
tions already done in computing the mean and standard deviation by the 
short method for grouped data: 


rxv = 


Wy') 

N 


CxCy 




N 


Cy 


[9:17o] 



230 THE PRODUa-MOMENT METHOD FOR THE CORREUTION OF VARIATES 

In this form the numerator gives the mean of the product deviations from the 
actual means of both variables, and the denominator gives the standard 
deviations of each in unit interval terms. To obtain the standard deviations 
in original score terms, each root value in the denominator must be multiplied 
by i, the size of the respective class intervals. 

The Frequency Distributions of Each Variable from the 
Correlation Chart 

The frequencies of the class intervals of each variable can be readily 
obtained from the cross-tabulated data. The summation of all the frequencies 
in each row of the chart in Fig. 9:13 gives the frequencies for each class 
interval of the y variable. Similarly, the summation of all the frequencies in 
each column gives the frequencies for each class interval of the x variable. 
These frequency summations are entered at the right and the bottom of the 
chart; thus the frequency distribution for the weight (y) variable is in the 
first column at the right, and the frequency distribution for the height (x) 
variable is in the first row at the bottom. The summation of each of these fre¬ 
quency distributions gives N, the total number of associated pairs or fre¬ 
quencies. The sum of the frequencies of each variable should, of course, be 
made independently so as to provide a check for the value of N. The value of 
N is entered in the square at the lower right-hand comer, just outside the 
correlation matrix. 

The Standard Deviations of Each Variable from the 
Correlation Chart 

The frequency distributions of each variable having been obtained, the 
next step is to compute their respective standard deviations. The procedure is 
the same as that used in determining standard deviations by the short method 
from grouped data (Table 7:10). Appropriate columns and rows are provided 
at the right of and below the matrix of the correlation chart for entering the 
relevant computations. The y variable will be computed first, and then the 
X variable. 

The unit interval deviations (y') from the guessed mean are entered in the 
column headed y at the right of the correlation matrix. The products of the 
frequencies and the unit interval deviations from the guessed mean are next 
computed and entered in the column headed/(y'). The sum of these products 
is necessary for the computation of Cy, the correction factor for the deviations 
from the guessed mean. As we saw earlier, 

c. = 2(//)/Ar 

The products of the frequencies and the squared deviations from the guessed 
mean are entered in the column headed/(y'*). The sum of the products of this 
column gives the total of the squared deviations from the guessed mean. All 



•‘yar/obfe (Mei^hf) m Inches 




§ 


N 


K 

M 


X 


CORREUTION CHART 

t 

1 

1 

•5 

■k 

t 

J 

DC 

i 

•5 

•5' 

X 

CO 

§ 

§ 

«> 

II 

i 

«• 

n 

Z. 

X 

A 

5^ 

II 

>S 

J 

>!L 

n 

o' 

cT 

•IS 

il 

1* 

II 

X 

2- 

1 

1 

1 

N 

1 

?; 

II 

1 

1 

t 

ti 

V 

§«. 

5 

K 

ti 

1 

II 

i- 

»N 

1 

1 

i 

tt 

*“x 

i 

lx 

N 

N 

M 

1 

II 

S' 

CM 

II 

>. 

Vo 

II 

II 

-x 

CM 

& 

II 

d* 

« 

1 

II 

Mo 

II 

A 


1 

1 

1 

1 

1 

I 

1 

1 

I 


I 

B 

i 

Km 

B 

I 

I 

1 

1 

II 

.1 

IN 

1* 

n 

^1? 

11 

J 

NO 

cx 

o 

o** 

CM 

%: 

1 

II 

II 

N 

x: 

il 

"u- 

1 

1 

1 

1 

1 

B 

B 

E 

I 

B 

B 

1 

B 

1 

B 

B 

B 

1 

1 

II 

•• 

te» 

1 

1 

1 

1 

1 

B 

B 

E 

i 

B 

B 

B 

1 

i 

1 

I 

i 

1 

■ 

n 

IH 

1 

1 

1 

1 

1 

B 

B 

B 

B 

B 

B 

B 

B 

B 

I 

B 

B 

1 

II 

n 

Si 

m 

□ 

B 

B 

B 

□1 

B 

B 

□ 

B 

a 

B 

B 

□ 

□ 

□ 

□ 

a 

□E 

m 


N 

1 

•n 

§ 

T 

II 

•N 

A 

1 

1 

■ 

1 

■ 

g 

1 

B 

B 

B 

B 

B 

B 

B 



F 

r 

rr 


^5 

II 
















n 





[H 

o 

























s 




















“1 






s 

00 

























s 

K 











V. 



r“ 











Q 

B 

B 

B 

B 

B 





















Q 

□ 

B 

B 

B 























□ 

E 

B 

B 

B 





















D 

B 

B 

B 

D 

B 








'*<1 






N 





i||| 


K 

□ 

B 

Ei 

B 

B 




r 










N 


■ 

fl 





H 

B 

B 

B 

B 



L 






CM 

«s 


CM 

Vo 


CO 

- 




ifl 


B 

D 

B 

B 

B 



L 







«> 



ua 


VO 

□ 

fl 



ifl 


H 

m 

B 

B 

o 












c*i 

VO 



□ 

■ 



II 

rv 

□ 

B 

B 

B 

B 




u 









-ii. 

>0 


□ 

■ 



II 


□ 

B 

E 

B 

B 















•V 

B 

fl 



r 

NO 

□ 

B 

e 

B 

B 
















B 

D 



II 


□ 

B 

1^ 

B 

El 



















T 

r 






















fl 

■ 

■ 

■1 

II 


□ 






















■ 

■1 

II 


{¥ 








_ 


















□ 

















_ 









fl 





(spi/ncy ui ) 9 iqDiJD^ - 

B 

B 

% 

n 

M 

_£ 


J2L 


231 


Fig. 9:13. Computation of the Product-Moment Correlation Coefficient (r). Short Method—Grouped Data Cross-Tabulated 

in Correlation Chart. Height (x) and Weight (y) Measures of 151 One-Year-Old Girls, from Data of Table 9:1. 
























232 THE PRODUCT-MOMENT METHOD FOR THE CORRELATION OF VARIATES 


the data necessary for computing the standard deviation of the y variable 
are now available.* 

The sum of the frequencies times the unit interval deviations from the 
guessed mean (S/y') is equal to —185, and AT is 151. The correction, Cy, is 
therefore —1.2.3: 


The sum of the squared deviations [2/(y'®)] from the guessed mean is 1155, 
and the standard deviation for weight is 2.48 pounds: 


- 


N 


ffy = 1.0 - (-1.23)* = 2.48 


The size of the class interval, i, is equal to 1.0. 

The same procedure is used for computing the standard deviation of the 
X variable (height), and is indicated at the bottom of the chart. The unit 
deviations (ac') of each class interval from the guessed mean are entered in the 
second row below the correlation matrix. The products of the frequencies 
for each class interval and their respective unit deviations,/(x'), are entered in 
the third row. The sum of these products divided by the number of cases 
gives the correction, c*: 


Cx — 


-21 

151 


= -.14 


The products of the frequencies and the squared deviations,/(ar'^), are entered 
in the fourth row. The standard deviation for height is found to be 1.14 inches: 

(Tx = O.SVflf - (-.14)2 = 0.5(2.27) = 1.135 

where 0.5 is equal to i, the size of the class interval for this particular variable. 

As already indicated, r, the product-moment correlation coefficient, is the 
ratio of the average of the product deviations to the product of the standard 
deviations. We now have the computations necessary for determining the 
denominator of this ratio, namely, the product of the standard deviations: 

CxCy = (1.135) (2.48) = 2.81 

The computation of the ratio necessary to express the value of r can be sim¬ 
plified by omitting i* and iy (the size of the class intervals) from the final 
computations. They can be omitted because they will cancel out algebraically 
from both the numerator and the denominator of the ratio for r. The product 
of the standard deviations that will be used is as follows: 


= (2.27)(2.48) = 5.63 

♦ An additional column may be added at the right of the chart in order to apply Charlier’s 
check for the sums needed in computing <t (cf. Table 7:10). 



PRODUCT-MOMENT r FROM GROUPED DATA (SHORT METHOD) 233 


The prime signs are used with the x and y subscripts to indicate that tlie 
product of the standard deviations has been obtained for unit intervals of 
deviation, rather than for actual intervals of the original variables. 

The Product Deviations 

The next step is to make the necessary (computations for the numerator 
of the ratio for r. In other words, we need to obtain the product of the devia¬ 
tions for each pair of associated measures, sum these products, and calculate 
their mean: 

Mean of product deviations = —^ 

There are several methods of obtaining the prcxiuct deviations from tlic 
cross-tabulations of a correlation chart. The method shown in Fig. 9:13 is in 
general use and provides not only an independent check of the sum of the 
product deviations, but at the same time a check for each of the correction 
factors, Cx and Cy. These values are obtained from the sum of the last two 
columns at the right of the matrix and of the last two rows at the bottom of 
the matrix. The computations required for the product deviations developed 
from the columns at the right will be described first. 

The next to the last column, headed x\ gives for all the cases in each 
class interval (n^w) of the y (weight) variable the sum of the deviations of the 
other variable {x) from the guessed mean of x. Thus, according to Fig. 9:13, 
in the highest class interval of the weight variable, y, there are three cases 
with an average weight of 28 pounds. Two of them are two unit intervals 
above {to the right of, on the chart) the guessed mean of the height variable, 
X, and the third case is six unit intervals above. Hence, the sum of the devia¬ 
tions of these three cases from the guessed mean of the x variable is 
equal to 

2 ( 2 ) + 1 ( 6 ) = 10 

This figure is entered in the next to the last column at the right of the chart. 

The next class interval of the y variable, with a rnid-point of 27 pounds, has 
three cases. The first is one unit interval below {to the left of, on the chart) the 
guessed mean of x and therefore has an x' value of — 1. The second is two unit 
intervals, and the third is six unit intervals, above the guessed mean of the x 
variable. The algebraic sum of the deviations of these three cases is equal to: 

-I +2 + 6 = 7 

The cases in each of the remaining (*lass intervals of the y variable (rows) 
are in turn summed with respect to their unit interval deviations from the 
guessed mean of the other variable, x, and these sums are entered in the next 
to the last column at the right. The sum of the entries in this column gives the 
sum of all the deviations in the distribution from the guessed mean of x, and 



234 THE PRODUCT-MOMENT METHOD FOR THE CORRELATION OF VARIATES 


should be equal to the sum already obtained in the usual manner (third row 
below the matrix in Fig. 9:13). In other words, 

S/x' = S/(a;') 

The computation of the product deviations is now simple since we have the 
sum of the deviations from the guessed mean of x for each of the class inter¬ 
vals of y. The prodiiots, ap'y', for each class interval of y are readily obtained 
by multiplying y' and I>fx\ because, algebraicallly, 

y'{2/x') = W/) 

for all frequencies in each row. These products are entered in the last column 
at the right of the chart. The sum of all these product deviations gives the 
total of the product deviations for the two variables S/(aj'yO* For the height- 
weight data of the 151 infants, this is equal to 595. The average of this, 
595/151, is 3.94, and is the mean of the product deviations from the guessed 
means of each variable. 

It is now necessary to correct for the fact that these product deviations were 
obtained from the guessed means rather than from the actual means This 
correction is the product of the correc^tion factors, c* and Cy; in other words, 



It has already been pointed out that the final computation of r can be sim¬ 
plified by omitting the original size of the class intervals of each variable, viz., 
ix and from the ratio. Hence, the average of the product deviations from 
the actual means of each variable, the class intervals being expressed as unit 
deviations from their means, is: 


S/(xy) _ 595 

N ~ 151 

= 3.94 - .17 = 3.77 


(-.14)(-1.23) 


Ratio for r 

The correlation coefficient, r, is now readily computed, since it is the ratio 
of the mean of the product-deviations to the product of the standard deviations 
of each variable. For the data in Fig. 9:13, r is equal to .670: 

— Mean of product deviations 
Product of standard deviations 
3.77 

Checking r(xV^) 

Ordinarily, before computing the ratio for the correlation coefficient, it is 
wise to check the calculations of the product deviations. Such a check can 
be made independently by computing the sum of the product deviations 



PRODUCT-MOMENT r FROM UNGROUPED DATA (SHORT METHOD) 235 


with respect to the class intervals of the x variable as well as of the y variable. 
These computations are given in the last two rows below the correlation 
matrix in Fig. 9:13. This time the sum of the deviations from the guessed 
mean of the y variable is obtained for each class interval of the x variable. 
Thus, the first column of data in the correlation chart shows four cases for 
the class interval of height that has a mid-point of 27 inches. All four cases 
are below the guessed mean of the other variable, y: the first, three unit 
intervals below; the second, five intervals below; and the remaining two, six 
intervals below. The sum of these four deviations is 

-3 + (“ 5) + 2(-6) =~20 

This figure for the column is entered in the fifth row below the chart. The de¬ 
viations from the guessed mean of the y variable for the cases in each of the 
other eleven class interval columns of the x variable are obtained similarly 
and entered in their respective columns in the fifth row below the chart. The 
grand total of all these sums is —185, a value that checks with that already 
obtained in the f(y') column at the right of the chart. 

The sum of the product deviations is now checked by multiplying the Sy' 
values of each column by x'. These products are entered in the last row at the 
bottom of the chart, and their grand total is obtained to give This 

total, 595, confirms the correctness of the sum of the product deviations 
already obtained in the last column at the right of the chart. 

Means and Standard Deviations from the Correlation Chart 

It has already been observed that the meth(xl just used for calculating the 
product-moment correlation coefficient does not directly employ the actual 
means and standard deviations of the original distributions of the two vari¬ 
ables being correlated. These two measures, which are ordinarily needed for 
descriptive and analytical purposes, are obtained by the short method for 
grouped data already described in Chapter 7. Thus, for the data in Fig. 9:13, 

Mean, = G.M., + i, 

Mean height = 29.5 (0.5)(~. 14) 

= 29.4 inches (x variable) 

Meany = G.M.y + iy 

Mean weight = 23.0 + 1.0(— 1.23) 

= 21.8 pounds (y variable) 

Standard deviation of height = ix(<rx') 

(Tx = 0.5(2.27) = 1.14 inches 

Standard deviation of weight = iy(<ry') 

ffy = 1.0(2.48) == 2.48 pounds 



236 THE PRODUCT-MOMENT METHOD FOR THE CORRELATION OF VARIATES 


F. METHOD III: PRODUCT-MOMENT r FROM UNGROUPED 
DATA (MACHINE METHOD) 

Machine Computation 

The method for computing the product-moment correlation coefficient to 
be described in this section has an advantage over both Methods I and II in 
that the original measures can be directly employed. No conversion to devi¬ 
ation values, X and y, is necessary. The underlying principle is similar to 
that of Method II, for all computations are made from guessed means and 
then corrected. However, as implied by the title of this method, the original 
data are not cross-tabulated into a correlation chart; hence Method III 
may at times present the same disadvantage as Method I, viz., the form 
which a bi-variate distribution takes cannot be seen unless a scattergram of 
the data is constructed. 

Method III is particularly valuable when machines are available for com¬ 
putations and all the inter-correlations of a number of variables are required, 
[n the machine procedure, the original data are usuaUy punched on cards 
(cf. the I.B.M. cards. Chapter 2), one card being used for each subject. After 
the data are punched, the cards are fed into a machine that is set to make the 
necessary multiplications and additions. The totals needed for the computa¬ 
tion of r by Formula 9:18 are obtained direct from the machine totals and the 
remaining calculations are facilitated by an electric calculator. 

Method III often proves valuable even though punch cards and machines 
are not available. This is the case provided the total number of paired fre¬ 
quencies, Ny is not greatly in excess of 100, and provided an adding or cal¬ 
culating machine and tables of squares and square roots (see Table I, 
Appendix C) are available. 

This method also has the important advantage of embodying a systematic 
set of checks for all computations.* These checks are particularly necessary to 
determine the accuracy of the method because the multiplication and addi¬ 
tion of many numbers are involved. 


The Guessed Means Taken as Equal to Zero 

In Method III a short cut is employed, in that deviations are taken from 
guessed means. However, in contrast to Method II, the guessed means are 
taken as equal to zero, as was done for the computation of M and <r in 
Method III in Chapter 7. Each original measure thus becomes a deviation 
value from a guessed mean. Each X and Y become x' and y' respectively, 
although the actual values of the original scores are not changed. All com- 

*The system of checks used in connection ¥Fith Method III has been adapted from 
Clark Hull, Aptitude Testing, World Book Co., Yonkers, 1928, pp. 427-439. 



PRODUCT-MOMENT r FROM UNGROUPED DATA (AAACHINE METHOD) 237 


putations are made from the original measures. Since this method utilizes the 
short-cut device of guessed means taken as equal to zero, the final computa¬ 
tions for r are analogous to those with Formula 9:17 in Method II. 

If any of the original measures of a bi-variate distribution are negative 
numbers, the original data must be modified before the guessed means are 
taken as zero. This is done by adding to each original measure a constant 
amount of sufficient size to convert all the original measures into positive 
numbers. After this conversion is made, the method proceeds in the manner 
to be described. The only additional correction necessary, because of the 
use of converted values, arises in computing the final value of the mean of the 
distribution. The standard deviation will not be affected since the addition 
of a constant amount to each measure of a distribution leaves the value of a 
unchanged. 


The Formula for r (Method III) 

The formula for computing r by Method III may be expressed in terms 
similar to those used in Formula 9:17: 



[9:17b] 


The above formula is identical with Formula 9:17, except for the omission of 
the /’s, which denote the multiplication of the deviation values of each class 
interval by the number of frequencies. Since the data arc not grouped into 
class intervals in Method III, this multiplication is unnecessary and hence 
the /’s do not appear in the formula. 

When the guessed means of distributions of positive numbers are taken 
as equal to zero, 

x' = A. and y = Y 

In other words, each original measure, its numerical value remaining un¬ 
changed, becomes a deviation from the guessed mean: 


= X - G.M., and / = K - 


If X and Y are substituted for x' and y' in Formula 9:17b, we have: 



[9:18] 

Pearson’s r: alternate 
form of 9:17 for un¬ 
grouped data with 
guessed means equal to 
zero 



238 THE PRODUa-MOMENT METHOD FOR THE CORRELATION OF VARIATES 


Since yields the mean of the x variable, and yields the mean of the y 
variable, Formula 9:18 may be rewritten as follows: 






[9:18a] 


But inasmuch as any sum divided by iV, the number of measures yielding the 
sum, is a mean measure, all the terms of the above formula can be expressed 
as means: 

. [9:18b] 

VMx^ - (Mx) - (MyY 

where Mxy is the mean of the products of the paired original measures of x 
and y, Mx^ is the mean of the squares of the original measures of the x variable, 
and Mya is the mean of the squares of the original measures of the y variable. 
Mx and My are, as usual, the means of the measures of each distribution. 


Inter-Correlation Coefficients 


It has already been pointed out that Method III is labor-saving whenever 
inter-correlations between three or more variables are required. This is true 
because the sums of the original measures of each variable must be obtained 
only once, and the sums of the squares of the original measures of each vari¬ 
able must likewise be computed only once. The means are obtained from the 
sums of the original measures; the standard deviations of each variable are 
computed from the sums of the squares. In the following pages we shall 
illustrate Method III for three variables, x, y, and z. However, in order to 
simplify the illustration, the data of only ten cases are used. 

The inter-correlation of three or more variables signifies that each vari¬ 
able is correlated with every other one. If only three variables are to be 
inter-correlated, only three coefficients will be needed: rx»j and How¬ 

ever, the number of inter-correlation coefficients increases rapidly with an 
increase in the number of variables to be inter-correlated. For n variables, 
the number of inter-correlation coefficients is: 


Number of inter-r’s = ^ 

Thus if there are ten variables to be inter-correlated. 


[9:19] 
Number of inter-corre¬ 
lation coefficients be¬ 
tween n variables 


10(9)/2 = 45 

and 45 r’s must be computed. 


In the example developed in Tables 9:3 to 9:6, three variables have been 
inter-correlated to illustrate the labor-saving characteristic of Method HI 



PRODUCT-MOMENT r FROM UNGROUPED DATA (AAACHINE METHOD) 239 


Table 9:3. Method III for r: Work Table and Checks; Original Data, 
Means, Squares, and Cross-Products 


(1) 

Subjects 

(2) (3) 

Original 

X Y 

(4) 

Z 

— 

(5) 

Check 

Sums 

iX+Y+Z) 

(6) 

Columns 

Sums 

Squared 

(X+Y4-Z)* 

(7) (8) (9) 

Squar.> for a't 

X* y* Z* 

(10) (11) (12) 

Cross-Products 

XY XZ YZ 

1 

15 

30 

9 

54 

2916 

225 

900 

81 

450 

135 

270 

2 

13 

24 

10 

47 

2209 

169 

576 

100 

312 

130 

240 

3 

11 

24 

5 

40 

1600 

121 

576 

25 

264 

55 

120 

4 

9 

20 

7 

36 

1296 

81 

400 

49 

180 

63 

140 

5 

17 

26 

9 

52 

2704 

289 

676 

81 

442 

153 

234 

6 

7 

15 

6 

28 

784 

49 

225 

36 

105 

42 

90 

7 

15 

34 

11 

60 

3600 

225 

1156 

121 

510 

165 

374 

8 

14 

27 

8 

49 

2401 

196 

729 

64 

378 

112 

216 

9 

16 

36 

9 

61 

3721 

256 

1296 

81 

576 

144 

324 

10 

11 

22 

4 

37 

1369 

121 

484 

16 

242 

44 

88 

N = 10 

S = 128 

258 

78 

464 

22600 

1732 

7018 

654 

3459 

1043 

2096 


SX 

SY 

SZ 

Z(X+Y+Z) 

s(x+y+zp 

2(X*) 

2(y») 

S(Z2) 

S(XY) 

S(XZ) 

2(yz) 

Means 

12.8 

25.8 

7.8 

46.4 

2260.0 

173.2 

701.8 

65.4 

345.9 

104.3 

209.6 


The mean of the x variable: Mx = 12.8 
The mean of the y variable: My — 25.8 
The mean of the z variable: Mz = 7.8 

Check I: SX + SY -f SZ = 2(X-f Y-f Z) 

128 + 258 + 78 - 464 

Check II: Mx + My Mz = M(x+f+Z) 

12.8 -f 25.8 + 7.8 = 46.4 

Check III: S(X®) + S(Y®) + S(Z®) + 2[S(XY) + S(XZ) -f 2(YZ)] = 2[(X + Y + Z)*] 

1732 + 7018 -f 654 + 2(3459 -j- 1043 + 2096) = 22600 

Check IV: Mx^ + My^ 4- Mz^ + 2(Mxr Mxz + Myz) = 2260 

173.2 + 701.8 + 65.4 + 2(345.9 104.3 + 209.6) = 2260 


when the inter-correlations of variables are to be obtained. As shown in 
Table 9:3, the measures of each variable are summed only once. Similarly, 
the squares of the measures of each variable are computed and summed only 
once. When the inter-correlations of ten variables are required, the only 
additional labor consists in computing the products of each pair of measures 
for each inter-correlation coefficient. Since ten variables yield 45 inter-corre¬ 
lation coefficients, 45 columns will be required for their cross-products. On the 
other hand, inter-correlating 45 variables with Method II requires 45 sepa¬ 
rate correlation charts, for each of the 45 sets of bi-variate data must be 
cross-tabulated into separate charts for the computations described in 
Table 9:2. Thus it should be apparent that Method III really saves labor 
when many inter-correlation coefficients are needed, provided, of course, 
that a calculating or adding machine is available. 


























240 THE PRODUCT-MOMENT METHOD FOR THE CORRELATION OF VARIATES 


Work Sheet for Original Data and Computation of Means, 
Squares, and Cross-Products (Table 9:3) 

The first step in computing the correlation coefficient by Method III con¬ 
sists in setting up a work sheet like that shown in Table 9:3. The subjects are 
listed in column 1. The measures for each subject are entered in the columns 
immediately at the right of this column—^in columns 2, 3, and 4, since there 
are three variables. Thus, Subject No. 1 had an original score of 15 on variable 
X, an original score of 30 on variable y, and an original score of 9 on variable z. 

Columns 5 and 6 are check columns with which to verify the accuracy of the 
computations. Column 5 shows the sum of each subject’s scores for all three 
variables: 54, for Subject No. 1. Column 6 gives the square of each sum in 
column 5. Thus, the squared value of Subject No. I’s 54 is 2916. 

Columns 7, 8, and 9 show the squares of the original measures of each 
subject which are needed for computing the standard deviation. Thus, in 
column 7, 225 is the square of Subject No. I’s score of 15 on variable x; the 
first value, 900, in column 8 is the squared value of 30, the score he received on 
the y variable; and the first number, 81, in column 9 is the squared value cf 
his score of 9 on the z variable. 

Columns 10, 11, and 12 show the cross-products of each subject’s original 
scores. Column 10 lists the cross-products for the x and y variables. Thus the 
first cross-product, 450, is the product of Subject No. I’s scores on variables 
X and y, namely, 15 and 30. In column 11 the cross-products of the x and z 
variables are listed; the first entry, 135, is the cross-product of his scores on 
variables x and z, 15 and 9.^ The cross-products of the y and z variables are 
listed in column 12, and 270, the first entry, is the cross-product of his y and 
z scores, namely, 30 and 9. 

A table of squares (see Table I, Appendix C) considerably facilitates the 
computations for columns 6 through 9. The cross-products in colunms 10 
llirough 12 can be obtained with a calculating machine or a table of the 
products of numbers. 

Once all the data for each subject are entered in the work sheet, each column 
is then summed; an adding machine is an obvious advantage here. These totals 
are entered in the next to the last row of Table 9:3. The means of the sums 
of each of these columns are next computed by dividing each total by AT, the 
number of cases or subjects, and the results are entered in the last row. 
When a series of measures is to be divided by a constant such as N, it is usu¬ 
ally simpler, if a calculating machine is available, to multiply each sum by 
the reciprocal of N (written as a decimal) rather than to divide each sum by 
iV. The reciprocal of a number is equal to the quotient obtained by dividing 1 
by the number.* Thus: 

Reciprocal of /V = — 

* Reciprocals of all integers from 1 to 1000 are given in Table I, Appendix C. 



PRODUCT-MOMENT r FROM UNGROUPED DATA (MACHINE METHOD) 241 

The product of the reciprocal of N and any other number is equal to the 
quotient of the latter number divided by N. Thus, where "SX represents 
any other number: 

^ = SA- SA (reciprocal of N) 

IfSA = 342 and AT is 50: 

-“(s) 

For the data in Table 9:3, the use of reciprocals to obtain the means of the 
sums of each column is obviously unnecessary, because the total number of 
cases is ten. The division of each sum by 10 requires only a shifting of the 
decimal point one place to the left. 

The mean of the x variable is seen to be 12.8; that of the y variable, 25.8; 
and that of the z variable, 7.8. 

Checks for the Computations in Table 9:3 

Before any of the checks now to be described for Table 9:3 are used, and 
before columns 5 through 12 are computed, the original entries in columns 2 
through 4 should be checked. If any errors are made in entering the original 
measures in these three columns, the ensuing computations cannot be cor¬ 
rect and the checks will not reveal such errors. 

The following four checks are required to insure the correctness of the 
computations in Table 9:3: 

Check I: 

SA' -f Sr + = Z{X 4- K + Z) 

128 + 258 + 78 = 464 

464 = 464 

This check establishes the correctness of the additions in columns 2 through 
4, which are used to obtain the means of the distribution. This check is based 
on the fact that the total of the sums of the columns in any table of numbers 
should equal the total of the sums of the rows in the table. 

Check II: 

Mx -h My + Mz ^ M^x+r+z) 

12.8 4- 25.8 4- 7.8 = 46.4 

46.4 = 46.4 

Check II is based on the same principle as the preceding, and is made for 
computing the means of columns 2 through 4. 

Check III: 

XiX^) 4- S(n 4- S(Z2) + 2\EiXY) + I^(XZ) + 2(yZ)] = 2[(X +Y + Zy] 

1732 4“ 7018 + 654 + 2(3459 H- 1043 4* 2096) = 22600 

22600 = 22600 



242 THE PRODUCT-MOMENT METHOD FOR THE CORRELATION OF VARIATES 


Check III, which establishes the correctness of the remaining computations 
in the table, is particularly valuable because it enables detection of any pos¬ 
sible error in squaring the original measures or in calculating their cross- 
products. This check is developed on the fact that the sum of the square of 
the sums of a series of numbers is equal to the sums of the squares of each 
number plus twice the sum of all their cross-products (taken two at a time). 
The check sum, which is obtained from column 6, is 22,600 for the data in 
Table 9:3. 

Check IV: 

Mx* + + Mz^ + 2(Mxy 4“ ^xz 4" Myz) = 2260 

173.2 4- 701.8 4- 65.4 4- 2(345.9 4-104.3 4“ 209.6) = 2260 

2260 = 2260 

Check IV establishes the correctness of the means of columns 7 through 12, 
and is based upon the same principle as Check III. The check for the means 
is obtained from the mean of column 6. 

Before proceeding with the computation of the standard deviations in 
Table 9:4, we shall summarize the order of operations in Table 9:3. 

Summary of Operatwns: Table 9:3 

1. Number consecutively each subject in the group (column 1). 

2. Record each subject’s original scores for each of the tliree variables in the 
X, y, and Z columns (columns 2, 3, and 4). Check all these entries for 
accuracy. 

3. Sum each subject’s scores and record each sum in column 5, headed 
(X + Y + Z). Square each of these sums and enter the results in column 6, 
headed (X + Y + Z)\ 

4. Square each original score of X in column 2 and enter it in column 7, 
headed XK Similarly, square each original score of Y and Z in columns 3 
and 4 respectively, and enter the squared values in columns 8 and 9, 
headed Y^ and Z\ 

5. Compute the cross-products for each subject’s score pairs for all variables 
taken two at a time. Enter these cross-products in columns 10,11, and 12, 
headed XY, XZ, and YZ. 

6. Sum all the columns of the table. 

7. Obtain the means of each column by dividing each sum by iV, the total 
number of cases (in this case N = 10). 

8. Apply checks I, II, III, and IV as indicated. The computations in Table 9:3, 
made from the original scores, and the means of all the columns are correct 
if the results satisfy the four checks. (Note: If decimals are dropped from 
the mean values of each column, checks II and IV may not be exact.) 



PRODUCT-MOMENT r FROM UNGROUPED DATA (AAACHINE METHOD) 243 

Computation of Standard Deviations of All Variables (Table 9:4) 

The computation of the standard deviation of an ungrouped distribution 
by the short method, when the original scores are taken as deviations from 
a guessed mean equal to zero, may be formulated as follows: 

[9:20] 

- - Standard deviation 

Gv =vMv ^2 — (Mv)* with guessed mean of 

variable taken equal 
to zero 

where the subscript u is used to represent any of the variables being inter- 
correlated, My^ is the mean of the deviations squared, and is the mean 
of the distribution. This formulation is for use with the means obtained from 
Table 9:3 and is the same as the following: 

S(n /SKy [9:20a] 

N \NI 

This is identical with Formula 7:6a. 

The variables being inter-correlated, and for which the standard deviations 
are therefore necessary, are listed in column 13 of Table 9:4. The means of the 
distributions of each variable are listed in column 14, their values having 
been obtained from Table 9:3. Each of these means is squared in column 15. 
In column 16 are entered the values of the means of the squared deviations 
from columns 7 through 9 of Table 9:3. 

Table 9:4. Standard Deviations for All Variables in Table 9:3 


= VMf* — 

(Where v Stands for Any Variable) 


(13) 

04) 

(15) 

(16) 

(17) , 

08) 

Variable 

Means 

(from Table 9:3) 

Means 

Squared 

Means of 
Squares 

(from Table 9:3) 

Variance 

G^ 

Standard 

Deviation 

G 

V 

Mr 

(M„)2 


Mk’ - (MJ* 

VMr« - (M,)* 

X 

1 

12.8 

163.84 



3.06 

y 

25.8 

665.64 



6.01 

z 

7.8 

60.84 



2.14 

Check sums i 

46.4 

890.32 

940.4 

50.06 

11.21 



The standard deviation of the x variable: Gt = 3.06 
The standard deviation of the y variable: O’,/ = 6.01 
The standard deviation of the z variable: G, — 2.14 


Check V: 


940.4 - 890.32 = 50.08 









244 THE PRODUCT-MOMENT METHOD FOR THE CORRELATION OF VARIATES 


The variance, cr^, of each distribution (the square of the standard deviation), 
is entered in column 17 and is equal to the difference between the mean of 
the squared deviations (original scores) and the square of the mean of the 
distribution. The latter value, obtained from column 15, is the correction 
factor, c, developed earlier for computing the standard deviation by the short 
method. When the guessed mean is taken as equal to zero, Cx is equal to 
ZX/N, which gives the mean of the distribution. 

The values of the standard deviations of each distribution are next com¬ 
puted and entered in column 18. They are equal to the square root of the 
variances (column 17) obtained for each variable. 

The only check necessary for most of the computations in Table 9:4 is 
Check V, which appears at the bottom of the table. In order to apply it, 
columns 15 through 17 must be summed. The principle of this check is 
similar to that of Check I for the data in Table 9:3; that is, the sum of a series 
of differences obtained row by row from a table should be equal to the differ¬ 
ences between the sums of the respective columns. Thus, the sum of the vari¬ 
ances obtained in column 17 should equal the difference between the sum of 
the values in column 16 and the sum of the values in column 15. 

Check V does not verify the accuracy of any entries in Table 9:4 taken from 
Table 9:3; these have to be checked independently. Nor does this check test 
the accuracy of the square roots of the variances, i.e., the standard deviation 
values in column 18. However, these latter values will be tested in Table 9:6 
by Check VIII. 

Computation of the Mean of the Product Deviations of Each 
Bi-Variate Distribution (Table 9:5) 

The computation of the mean of the product deviations of a bi-variate 
distribution by the short method may be formulated as follows: 

[9:21] 
Mean of product de- 
Muv = Muv “ MuMv viations of correlated 

variables with guessed 
means at zero 

where the subscripts it and v stand for any two variables being correlated. 
Muv is the mean of the product deviations when the deviations are taken 
from a guessed mean of zero and are consequently equal to the values of the 
original measures of the distributions being correlated. Mu and are, as 
usual, the means of the respective distributions of the variables being cor¬ 
related; they are the correction factors for the computation of r by this 
method. The data in columns 20, 21, and 22 in Table 9:5 are from Table 9:3. 

Check VI establishes the correctness of the products of the means in 
column 23 and is based on the sums of columns 14 and 15 of Table 9:4. 
Check VII verifies the sum of the means of the product deviations of 
column 24. 



PRODUCT-MOMENT r FROM UNGROUPED DATA (MACHINE METHOD) 245 
Table 9:5. Mean of Product Deviations for Each Pair of Variables in Table 9:1 

(where u and v stand for any two variables being correlated) 


(19) 

(20) 

(21) 

(22) 

(23) 

(24) 

Variables 

Correlated 

Means of 

Means of Distributions 

Products of 

Means of 

Cross-Products 

Correlated 

Means of 

Product 

(from Table 9:3) 

(from Table 9:3) 

Distributions 

Deviations 

U and V 

Muv 

M„ 


MuMv 

Mijv — MttMr 

X with y 

345.9 

12.8 

25.8 

330.24 

15.66 

X with z 

104.3 

12.8 

7.8 

99.84 

4.46 

y with z 

209.6 

25.8 

7.8 

201.24 

8.36 

Check sums 

659.8 



631.32 

28.48 


The mean of the product deviations of x and y: Mxj/ = 15.66 
The mean of the product deviations of x and z: Mxz = 4.46 
The mean of the product deviations of y and zi Myz = 8.36 


Check VI: 

Wx + My + MzP - [(MxP + (MyF -f (MzPl = 2 [S(M.M, + M,M, + M,,M.)] 
(46.4)2 _ 890.32 = 2(631.32) 

2152.96 - 890.32 = 1262.64 


Check VII: 

S [Mxy + Mxz + Myz) - 2 (MxM„ -f M,M, -f M,M*) = 2(M,„ + Mx, + Myz) 
659.8 - 631.32 = 28.48 


The preceding formulation of the mean of tlu^ product deviations is for 
use witli the means obtained from Table 9:3. Formula 9:21 may be stated 
in original score form as follows: 

Mean of product deviations = [9: 21a] 

Computation of the Correlation Coefficients (Table 9:6) 

The final step in computing the inter-correlation coefficients is shown in 
Table 9:6, and is based upon the formulation for r in Formula 9:18b. That is, 
the correlation coefficient in each case is the ratio of the product deviations 
obtained in Table 9:5 to the products of the standard deviations of the two 
variables being correlated. The standard deviations were obtained in Table 9:4. 
In column 26 of Table 9:6 are entered the means of the product deviations 
from Table 9:5. In columns 27 and 28 are entered the standard deviations 
of the variables being correlated from Table 9:4. Column 29 shows the products 
of the standard deviations. In column 30 are entered the ratios of the means 
of the product deviations to the products of the standard deviations to give 
the correlation coefficients. The coefficients for the data used to illustrate 
Method III range from .65 to .85. 
















246 THE PRODUCT-MOMENT METHOD FOR THE CORRELATION OF VARIATES 


Table 9:6. The Product-Moment Correlation Coefficients 


(where u and v stand for any two variables being correlated) 


(25) 

(26) 

Means of 

(27) 

(28) 

(29) 

Variables 

Correlated 

Product 
Deviations 
(from Table 9:5, 

Standard Deviations 
(from Table 9:4, column 18) 

Products of 
Standard 
Deviations 


column 24) 




u and v 

Mur 

O-u 

(Tv 

(7u(7 V 

X with y 

15.66 

3.06 

6.01 

18.39 

X with z 

4.46 

3.06 

2.14 

6.54 

Y with z 

8.36 

6.01 

2.14 

12.86 

Check sum 




37.79 


(30) 

Product'Moment 

Correlation 

Coefficients 

(Tttfr t> 


15.66 

18.39 

4.46 

6.54 

8.36 

12.86 


= .85 = txy 

= .68 = vxz 


= .65 = 


The correlation coefficient for variable x with yt rxy = .85 
The correlation coefficient for variable x with z: txz == .68 
The correlation coefficient for variable / with z: tyi — .65 


Check VIII: (2(7)® — S (<7®) = 22 (product of standard deviations) 

(11.21)2 _ 50.08 = 2(37.79) 

75.58 = 75.58 


The final check, Check VIII, to be applied to the computations of Table 9:6 
is as follows: 

Check VIII: 

(XaY — 2 (< 72 ) = 22 (product of standard deviations) 

(11.21)® - 50.08 = 2(37.79) 

75.58 = 75.58 

This check tests the accuracy of the products of the standard deviations 
computed in this table. The first term of the check, (2cr)®, is obtained from 
the sum of colunm 18 in Table 9:4 and the second term is obtained from the 
sum of column 17 in the same table. Check VIII thus tests the accuracy not 
only of the standard deviations obtained in Table 9:4, but also of the products 
of the standard deviations in column 29 in Table 9:6. 

The means of the product deviations entered in colunm 26 must be inde¬ 
pendently checked with the values for these means obtained in Table 9:5. 
There remain to be checked only the ratios for the correlation coefficients 
themselves, in column 30. A method for checking these ratios independently 
is to repeat the operation in reverse, either (1) by multiplying the correla- 

















OTHER METHODS FOR THE COMPUTATION OF r 


247 


tion coef&cient by the product of its standard deviations to give the value 
of the mean of the product deviations, or (2) by dividing the mean of the 
product deviations by its corresponding correlation coefficient to give the 
value of the corresponding product of the standard deviations. If the compu¬ 
tation of r is correct, either of these results should check, except for dropped 
decimals. 


G. OTHER METHODS FOR THE COMPUTATION OF r 

We have illustrated three of the commonly used methods for computing 
product-moment correlations. Methods I and III, for ungrouped data, ordi¬ 
narily have an advantage over Method II, since they give a more precise 
estimate of r than is the case when the data are grouped into class intervals. 
Method II, however, has the advantage inherent in the portrayal of a bi¬ 
variate distribution by a correlation chart; the investigator can usually see 
whether a straight-line function is in reality the appropriate one to use in the 
correlation of the two variables. Furthermore, the sacrifice of mathematical 
precision when the data of one or both variables are grouped into class inter¬ 
vals of several units is negligible when there are at least 12 class intervals 
and 40 or more correlational frequencies. 

Short Methods II and III of course have the computational advantage over 
Method I that is implied by the characterization “short.” Method III, how¬ 
ever, is satisfactory from a computational point of view only if a machine 
calculator is available, particularly for adding columns of numbers, squares, 
and cross-products. 

In the next chapter we shall discuss methods for the correlation of bi¬ 
variate data that provide estimates of r in cases in which the use of the 
product-moment method is not feasible or convenient. However, before 
these different techniques are considered, two additional methods for com¬ 
puting r should be discussed briefly: one by the sums of paired deviations; 
the other, by the differences of paired deviations.* Both give a result which 
is algebraically the same as the product-moment method. 


The Method of Sums for r 


If s represents the sum of two paired deviations, x and y, then 


2<rx<ry 


[ 9 : 22 ] 

Pearson r by the 
method of sums 


where is equal to the variance of the sums of all paired deviations; 
and <Ty^ are the variance of each variable correlated; and o** and <7y are their 
standard deviations. 


* Cf. Peters and Van Voorhis, op. cit.^ pp. 101-103, for the derivation of these formulas 
from product-moment r. 



248 THE PRODUa-MOMENT METHOD FOR THE CORRELATION OF VARIATES 


This procedure for computing r is based on the sums of associated pairs 
rather than on their products. The variance of the paired sums is as follows: 


- S(* + y)* 

. 


C9i23] 

Variance of paired 
sums 


This method for computing r is used only infrequently; however, the method 
of differences has a very practical application that we shall illustrate with the 
digit-span test scores in Table 9:2. 


The Method of Differences for r 


If d represents the difference between the deviations, x and y, of two paired 
measures, the variance of all the paired differences of a bi-variate distribution 
equals: 

, S(a:-.y)® , . [9:24] 

<r<r =-yy Variance of paired dif¬ 

ferences 


and the correlation coefficient by the method of differences is: 


2(Tzcry 


[9:25] 

Pearson r by method 
of differences 


This formula can, under certain circumstances, be simplified to yield a con¬ 
venient method for computing a reliability coefficient of a test (cf. Chapter 17, 
Section B). Thus, if the variance and the means of two forms of a test, or two 
halves of a test, can be assumed to be equal, Formula 9:25 becomes: 

= [9:26] 

2N<Tx^ Special case of 9:25 


where S(D2) is the sum of the squared differences between the original values, 
X and y, of each pair of associated measures; N is the number of correla¬ 
tional frequencies, and is the variance of the test. 

The correlation coefficient for the digit-span data in Table 9:2 is com¬ 
puted in Table 9:7 by this latter method. We saw in Table 9:2 that the mean 
of the first digit-span test (x) was 7.45, and that of the second test (retest 
with another series of numbers) was 7.65. The standard deviation of the 
first was 1.30, and of the second, 1.22. The means and standard deviations 
of each test were thus similar, although not identical. 

The difference, Z), between each subject’s pair of scores is obtained in 
column 4 and these differences are squared in column 5 to give D^. The sum 
of the squared differences is 11.1250. If the square of the standard deviation 
of the first administration of the test, which was found in Table 9:2 to be 1.3, 



OTHER METHODS FOR THE COMPUTATION OF r 


249 


Table 9:7. Computation of r by the Method of Differences 
(Digit-Span Test Data and Variance of x from Table 9:2) * 


0) 

(2) 

(3) 

(4) 

(5) 


Digit-Span Scores 

Differences Between 

Subiects 

1st Test 

Retest 

Xand r 

. 


X 

Y 

D 

D* 

A 

8.5 

8.25 

.25 

.0625 

B 

6.75 

8.25 

-1.50 

2.2500 

C 

6.25 

7.75 

-1.50 

2.2500 

D 

6.0 

5.5 

.50 

.2500 

E 

7.0 

7.25 

-.25 

.0625 

F 

9.25 

9.25 

0 

0 

G 

5.5 

5.75 

-.25 

.0625 

H 

9.25 

8.25 

1.00 

1.0000 

1 

7.75 

6.75 

1.00 

1.0000 

J 

5.0 

5.25 

-.25 

.0625 

K 

6.5 

7.25 

-.75 

.5625 

L 

7.75 

8.5 

-.75 

.5625 

M 

9.0 

8.0 

1.00 

1.0000 

N 

7.5 

8.25 

-.75 

.5625 

O 

7.0 

7.75 

-.75 

.5625 

P 

7.75 

8.25 

-.50 

.2500 

Q 

5.75 

5.75 

0 

0 

R 

8.75 

9.5 

-.75 

.5625 

S 

8.75 

8.50 

.25 

.0625 

T 

9.0 

9.00 

0 

0 

N = 20 

149.0 

153.0 


S = 11.1250 


so* 11.1250 

2N<r*2 2(20)(1.3P 


* The value of <7, for these data was 1.3. 


is used for the variance of the test, the coefficient r by the metliod of differences 
is as follows: 

11.1250 11.1250 

2(20)(1.3)2 “ 67.60 

= 1 - .165 = .835, or .84 


which is identical with the value for r obtained in Table 9:2 by the product- 
moment method. 

When this abbreviated method of differences is employed alone, two addi¬ 
tional columns are necessary in Table 9:7 in order to obtain the variance of 
the test. These are columns (4) and (6), for x and in Table 9:2. If the ab¬ 
breviated method of Formula 9:26 cannot be employed, the variance and 
standard deviation of y must also be computed, as indicated in Formula 9:25. 



250 THE PRODUCT-MOMENT METHOD FOR THE CORRELATION OF VARIATES 

EXERCISES 

1. Cite two examples of perfect correlation other than those mentioned in this 
chapter. 

2. What is the difference between a scattergram and a correlation chart? 

3. Under what circumstances is it advisable to make a scattergram or correlation 
chart of a bi-variate distribution? 

4. What can be inferred about the correlation between two variables from a scatter¬ 
gram? 

5. What are the basic assumptions for the use of Pearson’s product-moment method 
of correlation? 

6. What is the difference between a correlational frequency and a statistical fre¬ 
quency? 

7. Why must the data of a bi-variate distribution be associated by individual pairs 
in order for a correlation coefficient to be calculated? 

8. What essential properties do some of the fourfold charts in Chapter 4 and the 
correlation chart in Fig. 9:8 have in common? 

Use the data in Table 5:14 for the following six problems; 

9. Set up a correlation chart in z score form and estimate the degree of correlation 
between the average grades and intelligence test scores of the college freshmen 
from the regression fine of grades on intelligence test scores. 

10. Compute the correlation between the intelligence test scores of college freshmen 
and their best friends by means of the correlation chart method used in Fig. 9:13. 

11. Using the data for only the last 25 cases, compute by the long method of Table 9:2 
the product-moment correlation between grades of the college freshmen and of 
their best friends. 

12. Use Method III (Tables 9:3-9:7) to compute the inter-correlations between the 
grades, intelligence test scores, and ages of the college freshman group. 

13. Set up a regression equation in original score form and predict for the data in 
Exercise 10 the average intelligence test score received by the best friends of 
college freshmen whose intelligence test scores were 90. 

14. Set up a regression equation in original score form and predict for the data in 
Exercise 11 the average grade received by the best friends of college freshmen 
whose average grade was 73. 

15. Use the method of differences (Formula 9:25) to compute the correlation of the 
height-weight data in Table 9:1. 

Use the data of Table 9:8 for the following problems: 

16. Make a scattergram and then determine the correlation between 1943-44 total 
annual expenditures and expenditures for instruction in the 68 cities. 

17. What are the mean total expenditure and the mean expenditure for instruction? 

18. Is the Variability in total expenditures among the 68 cities relatively different from 
the variability in expenditures for instruction? 

19. What is the average amount spent for instruction of cities whose total expenditures 
were 

a. over $200.00 

b. less than $100.00 

c. between $130.00 and $140.00 

d. between $170.00 and $180.00 



OTHER METHODS FOR THE COMPUTATION OF r 


251 


Table 9:8. Total 1943-44 Annual Expenditure per Pupil in Average Daiiy Attend¬ 
ance, and Expenditure per Pupil for Instruction, in the School Systems of 
68 Cities of from 30,000 to 100,000 Population * 


City 

Total Annual 
Expenditure 

Annual Expenditure 
for Instruction 

Fort Smith, Ark. 

$ 68.97 

$ 54.13 

Little Rock, Ark. 

70.66 

55.14 

Alhambra, Calif. 

156.62 

118.13 

Glendale, Calif. 

166.22 

130.27 

Santa Barbara, Calif. 

175.59 

131.69 

Stamford, Conn. 

178.86 

140.74 

Waterbory, Conn. 

167.22 

131.73 

West Hartford, Conn. 

136.83 

106.33 

Aurora (East Side), III. 

126.67 

96.79 

Danville, III. 

98.53 

68.91 

Decatur, III. 

93.98 

73.56 

Elgin, III. 

119.39 

93.10 

Moline, III. 

117.39 

79.53 

Quincy, III. 

114.52 

87.26 

Rock Island, III. 

94.41 

71.55 

Elkhart, Ind. 

113.83 

86.54 

Evansville, Ind. 

126.59 

97.65 

Davenport, Iowa 

126.71 

92.15 

Dubuque, Iowa 

164.45 

120.65 

Ottumwa, Iowa 

101.25 

78.11 

Covington, Ky. 

128.35 

102.85 

Lexington, Ky. 

102.40 

83.44 

Brookline, Mass. 

195.44 

149.63 

Chicopee, Mass. 

146.48 

109.06 

Holyoke, Mass. 

175.01 

131.75 

Lynn, Mass. 

152.82 

114.39 

Medford, Mass. 

123.78 

100.99 

Salem, Mass. 

147.45 

113.67 

Battle Creek, Mich. 

131.02 

95.34 

Dearborn, Mich. 

170.70 

124.74 

Jackson, Mich. 

129.87 

94.94 

Kalamazoo, Mich. 

142.05 

105.87 

Lansing, Mich. 

130.12 

99.53 

Jackson, Miss. 

72.70. 

59.46 

Joplin, Mo. 

81.72 

60.73 

Nashua, N. H. 

128.52 

90.47 

Atlantic City, N. J. 

205.31 

156.14 

East Orange, N. J. 

203.96 

164.91 

Hoboken, N. J. 

207.95 

149.51 

Irvington, N. J. 

181.69 

137.99 

Montclair, N. J. 

242.66 

192.95 

New Brunswick, N. J. 

178.22 

143.25 

Plainfield, N. J. 

179.04 

140.75 

Albuquerque, N. Mex. 

88.78 

71.25 

Elmira, N. Y. 

142.92 

111.39 


From School Life^ December, 1945, p. 23. 




252 THE PRODUCT-MOMENT /AETHOD FOR THE CORRELATION OF VARIATES 

Table 9s8 — (Conf/nued) 


City 

Total Annual 
Expenditure 

Annual Expenditure 
for Instruction 

Jamestown, N. Y. 

$161.01 

$116.56 

Troy (Union District), N. Y. 

166.50 

117.20 

White Plains, N. Y. 

257.54 

193.73 

Cleveland Heights, Ohio 

196.40 

137.70 

Lakewood, Ohio 

190.43 

143.98 

Marion, Ohio 

85.88 

63.73 

Steubenville, Ohio 

126.92 

96.72 

Harrisburg, Pa. 

161.30 

120.88 

New Castle, Pa. 

132.50 

96.18 

Wilkes-Barre, Pa. 

151.29 

114.41 

Cranston, R. 1. 

120.66 

97.27 

Spartanburg, S. C. 

73.10 


El Paso, Tex. 

78.54 


Port Arthur, Tex. 

81.13 


Waco, Texas 

79.17 


Petersburg, Va. 

90.65 


Portsmouth,. Va. 

84.89 


Everett, Wash. 

110.18 


Madison, Wis. 

155.07 


Oshkosh, Wis. 

136.25 

105.12 

Racine, Wis. 

127.19 

97.13 

Sheboygan, Wis. 

117.80 

87.80 

West Allis, Wis. 

145.43 

112.75 







CHAPTER 10 


Special Methods for the Linear 
Correlation of Variables 


Several methods, other than those described in the preceding chapter, are 
available for the linear correlation of two variables. These other methods 
yield coefficients analogous in their general implications, to r, and are used 

(1) when the product-moment method described in Chapter 9 cannot be 
directly employed because of the nature of the data to be correlated, or 

(2) when one of these other methods is more desirable as a short-cut com¬ 
putational procedure. The following methods will be described: * 

A. Correlation of Ranks. 

B. Serial Correlation. 

C. Tetrachoric r. 


A. CORRELATION OF RANKS 
Purpose of the Method 

A method for the correlation of ranks was developed several decades ago 
by the English psychologist and statistician, the late Charles Spearman, in 
order to provide an estimate of linear correlation when either or both of the 
variables being correlated could at least be ranked if not quantitatively 
differentiated. This and the several other methods that were eventually 
developed are ordinarily used for variables that cannot be differentiated in 
a satisfactory quantitative way, but for which ratings can be differentiated 
and then ranked. However, product-moment r is a general method for the 
correlation of linear relationships, whether or not the data of one or both 
variables are quantitatively differentiated or are in the form of centiles or 
ranks. Hence, the method now to be described is essentially a short-cut 
procedure for bi-variate data. 

Variables that can be differentiated by ranks often occur in the ratings of 
aesthetic qualities, of personalities, of acliievement or success, of quality of 
performance on a job, etc. Ratings are commonplace in psychological measure¬ 
ment. 

* The 0 coefficient and the Coefficient of Mean Square Contingency are also used to 
give estimates of r, i.e., linear correlation. Since they were presented in chap. 4, they will 
not be discussed further in the present chapter. 

253 





254 SPECIAL METHODS FOR THE UNEAR CORRELATION OF VARIABLES 

Methods for correlating ranks are also sometimes used as short-cut devices 
for the correlation of variables that are actually quantitatively differentiated, 
as when only small groups or samples of data are available for correlation. 
Rank methods are often employed under these circumstances rather than 
the product-moment method because they are easier to apply, and, under 
certain circumstances, they yield as satisfactory a result. 

Spearman's Rank-Difference Method 

Three variations of methods for the correlation of ranks have been de¬ 
veloped.* They are as follows: 

1. The Rank-Difference Method. 

2. The Rank-Product Method. 

3. The Rank-Sum Method. 

The first of these has long been used and is known as the Spearman rank- 
difference method. The computational procedure is based upon the differences 
between the ranks of each associated pair of ratings or scores for the variables 
being correlated. The second method is based upon the products of these 
ranks, and the third is based upon the sum of these ranks. The tables de¬ 
veloped by Du Bois have made the computations necessary for any one of 
these methods somewhat simpler, f Inasmuch as all these methods yield the 
same result, only the one most commonly used, the rank-difference method, 
will be described. 

When a different rank can be assigned to each member or case of the 
variables being correlated, the coefficient obtained is equivalent to the product- 
moment r. In other words, if the ratings obtained for a variable are all differ¬ 
ent, they can be ranked as follows: 1, 2, 3, 4, • • • n. Often this ranking is 
impossible because two or more cases may receive the same rating or score. 
Several procedures have been developed for treating such duplications, f 
However, if ratings or scores are duplicated in only a small proportion of the 
cases, the resulting coefficient should not be particularly distorted and conse¬ 
quently will still yield a satisfactory estimate of linear correlation. 

In order to distinguish a correlation coefficient obtained by a rank method 
from one obtained by the product-moment method, p (rho), the symbol for 
the Greek letter r, is used. 

The development of Spearman’s rank-difference method is illustrated in 
Table 10:1. The correlation coefficient itself is computed by the following 
formula: 

These three variations are analogous to those for product-moment r described at the end 
of the last chapter, viz., the methods of sums and of differences, and of product deviations. 
Spearman’s rank-difference method is a special case of the method of differences for r. 
(Cf. the end of the present section.) 

t Philip H. Du Bois, ’’Formulas and Tables for Rank Correlation,” Psychological Record, 
3:46-56, 1939. 

tibid. 



CORRELATION OF RANKS 


255 


6S(D2) 
N(N^ - 1 ) 


[ 10 : 1 ] 
Spearman’s rank-<lif- 
ference coefficient for 
linear correlation 


where 2(0^) represents the sum of the squares of the differences between the 
ranks of each associated pair, 6 is a constant, and N as usual is the number 
of associated pairs in the group or sample of the variables correlated. 


Table 10:1. Spearman's Method of Rank-Difference Correlation: Correlation 
Between Aptitude Test Scores and Achievement Ratings 


(1) 

Subject 

(2) 

Achievement 
Ratings by 
Ranks 

(3) 

Scores on 
Aptitude Test 

(4) 

Rank or Mid> 
Rank for Ties: 
Aptitude Test 

(5) 

Rank 

Differences 

(0) 

(6) 

D* 

A 

1 

72 

2 

1.0 

1.0 

B 

2 

60 

5.5 

3.5 

12.25 

C 

3 

65 

4 

1.0 

1.0 

D 

4 

60 

5.5 

1.5 

2.25 

E 

5 

76 

1 


16.0 

F 

6 

68 

3 


9.0 

G 

7 

52 

12.5 


30.25 

H 

8 

56 

8.0 


0 

1 

9 

54 

10.5 

1.5 

2.25 

J 

10 

56 

8.0 

2.0 

4.0 

K 

11 

52 

12.5 

1.5 

2.25 

L 

12 

54 

10.5 

1.5 

2.25 

M 

13 

48 

14 

1.0 

1.0 

N 

14 

56 

8.0 

6.0 

36.0 


15 

36 

19 

4.0 

16.0 


16 

46 

15 

1.0 

1.0 


17 

40 

17.5 

.5 

.25 


18 

30 

20 

2.0 

4.0 


19 

40 

17.5 

1.5 

2.25 

u 

20 

44 

16 

4.0 

16.0 

S = 159.00 


6SD2 ^ 6(159) 

N(N2 - 1) 7980 


1 - .1196 = .88 


The variables correlated in Table 10:1 (achievement ratings and aptitude 
scores) are suggestive of the type of situation for which the rank method is 
useful. Results for only 20 subjects are presented in the table. The subjects 
are designated by the letters A to T in column 1. Each subject’s achievement 
rating is listed in rank order in column 2. No ties in ratings are indicated. 
Subject A received the highest rating and consequently has a rank of 1; 
Subject B received the next highest rating and has a rank of 2, etc. The 
aptitude test scores of the 20 subjects are listed in column 3. They range in 

















256 SPECIAL METHODS FOR THE LINEAR CORRELATION OF VARIABLES 


size from 30 to 76, but they are not listed in order of size because each sub¬ 
ject’s aptitude score must be paired with his achievement rating. Only if the 
correlation between scores and ratings were perfect (1.00) would the scores 
in colunm (3) prove to be listed in order of size. 

Ranking the Test Scores 

Since the achievement ratings are already ranked in column 2, it is un¬ 
necessary to adjust them further in preparing to compute the correlation 
coefficient. However, the aptitude test scores in column 3 must be ranked 
before the computations can be made. These ranks, which are presented in 
column 4, are obtained as follows: 

The subject receiving the highest score on the aptitude test (Subject E) is 
given a rank of 1. The one with the next highest score (Subject A) is given a 
rank of 2. The third highest score was received by Subject F; the fourtli 
highest, by Subject C. Subjects B and D, however, both received the same 
score, 60. These two cases represent a duplication and require an adjustment 
in the ranking procedure, since one subject cannot very well be given a 
rank of 5 and the other a rank of 6. Several methods have been developed 
for making this adjustment,* but the one generally found most satisfactory 
and the simplest to use is averaging the ranks involved in any duplication. 

Averaging Ranks 

Since the scores of Subjects B and D are the fifth and sixth cases in order 
of size, their average rank is 5.5 and therefore this rank is assigned to both of 
them in column 4. The seventh highest score in column 3 is 56, but three 
subjects—^H, J, and N—have this score. These three cases are the seventh, 
eighth, and ninth in order of size, and hence the average of their ranks is 
8.0. Although the ranks in column 4 can be obtained by inspection, the surest 
way to avoid error is to list the scores to be ranked in order of size and to 
number the order of the scores thus listed, as follows: 


76 .. 

..1 


52 .. 

..12 

72 .. 

..2 


52 .. 

..13 

68 .. 

..3 


48 .. 

..14 

66 .. 

..4 


46 .. 

..15 

60 .. 

60 .. 

..51 

.. 6 j 

[ = 5.5 

44 .. 

40 .. 

..16 

..17 

56 .. 

..71 


40 .. 

.18 

56 .. 

..8 

1 = 8.0 

36 .. 

..19 

56 .. 

..oJ 


30 .. 

..20 

54 .. 

54 .. 

..10 

..11 

1 = 10.5 




Such a procedure makes it obvious that there are two case^ with a value of 
60, they were fifth and sixth in order of size, and therefore their average rank 

♦ Philip H. Du Bois, ** Formulas and Tables for Rank Correlation,” Psychological Record, 
3 : 46 - 56 , 1939 . 



CORRELATION OF RANKS 


257 


is 5.5. Similarly, there are three cases with the value of 56; they were the 
seventh, eighth, and ninth in order of size; and their average rank is 8.0. 
There are two scores of 54; here the average rank is 10.5. There are also 
two scores of 52 with an average rank of 12.5, and there are two scores of 
40 whose average rank is 17.5. 

The rank of the lowest score should of course be equal to the total number 
of cases when there are no duplications of this lowest score. In other words, 
the rank of the lowest score should always be equal to N, the number of cases 
in the distribution, unless there are duplications of the lowest score. 

Once the ranks are obtained by the above procedure, they are entered in 
column 4 for each subject. 


The Computation of Rho 

With the data of both variables now ranked, we can proceed to the com¬ 
putation of the correlation coefficient. This involves two steps. (1) The 
difference between the ranks of each associated pair must be obtained and 
entered in column 5, and (2) these differences must be squared and entered 
in column 6. Since only the sum of the squared differences is needed to com¬ 
pute p, it is unnecessary to take into account the direction of the difference 
between each rank pair and to use a plus or minus sign in column 5, 

The sum of the squared differences is indicated at the bottom of column 6. 
This value, 159.0, is needed for the computation of rho at the bottom of the 
table. The coefficient is equal to .88. Relatively, this is a high correlation. 
Despite the fact that there were very few associated pairs whose ranks on the 
achievement ratings and aptitude tests were the same, the correlation is sub¬ 
stantial. This is so because the differences between the ranks of associated pairs 
were not very large. The greatest difference was for Subject iV, whose ranks on 
the achievement rating and the aptitude test were 14 and 8.0 respectively. This 
rho coefficient of .88 is analogous to a product-moment r and is indicative of 
a degree of co-relationsliip between two variables that would be implied by 
r itself. 

When the differences between the ranks of the associated pairs of two 
variables are at a maximum, the correlation coefficient will not be zero but 
will approach —1.00 as a limit. This is so because the subject with the highest 
rank on one variable will have the lowest rank on the other, etc., and a perfect 
inverse relation will obtain. A low correlation, that is, one close to zero, occurs 
when there is no relationship between the ranks of the two variables being 
correlated. 


The Relation of r to Rho 

We saw in the preceding chapter that linear correlation may under certain 
‘ circumstances be obtained by a method of differences, for which the formula 

r = l-^ [ 9 : 26 ] 



258 SPECIAL METHODS FOR THE UNEAR CORRELATION OF VARIABLES 


This formula is based on the assumption that the means and standard devia¬ 
tions of both variables correlated are equal. This assumption holds for each 
series of ranks of two correlated variables. Thus, if two series of 5 ranks each 
hre correlated, the mean rank of both will be 3, since 


(1+ 2 + 3 + 4 + 5)/5 = 3 

Generally, the mean of any series of ranks, 1 to n, is equal to 

/i + l 


[ 10 : 2 ] 
Means of a series of 
ranks, 1 to n. 

,_^_ [10:3] 

<^ranka + X 2 ^ + OTa* . . . + Xr?)/N Standard deviation of 

a series of ranks, 1 to n 


Franks — 


And the standard deviation of a series of ranks is: 


where Xi, • Xn are the successive ranks expressed as deviations from 

the mean rank. Thus, for 5 ranks: 


<rrank8 = V^(2^ + P + 0 + P + 2=^)/Ar = VlO/5 = V2 =: 1.41 

Under these circumstances of equal means and standard deviations. 
Formula 9:26 for the correlation of differences may be adapted to the special 
case of differences between ranks^ as follows: 


SD2 

2A^<tP 


2IV 


The standard deviation of a series of n ranks can be obtained more readily 
by the following formula than by its equivalent. Formula 10:3. 


^Tranks — 



[10:3a] 
Standard deviation of 
n ranks 


where n, the number of ranks in a distribution, is equal to N, the number 
of cases to be ranked. Substituting this value of cTranka in the above formula 
and substituting rho for r, since we are now dealing with the special case of 
rank differences, we have: 

SD2 SD2 62D2 P ^ 

^ ^ 2V(V» - 1) ^ N(N^ - 1) ^ - 1) 

12 6 


B. SERIAL CORRELATION 

The linear correlation of a continuously distributed variable with one 
which is segmented or divided into only a few or even two classes is known 
as serial correlation. Until recently, methods for serial correlation were 
available only for biserial r, in which the segmented variable is dichotomized. 



SERIAL CORRELATION 


259 


We shall present first the methods of biserial r and point-biserial r and then 
describe methods of serial correlation for variables that are segmented into 
more than two broad classes. 

Biserial Correlation 

Purpose of the Method 

The biserial method for linear correlation is useful for situations in which 
one of the bi-variates is dichotomized rather than continuously distributed. 
The method is based on the assumption that the variable which is dichoto¬ 
mized would, if quantitatively differentiated, yield the normal, bell-type of 
distribution. The continuously distributed variable that is correlated with the 
dichotomized one is not, however, assumed to be normally distributed. The 
correlation coefficient obtained by the method of biserial r is symbolized by 
rhi because it is analogous in its implications about co-variability to product- 
moment r. 

Biserial r has been extensively used in analyses of the value of single items 
of psychological tests, and of the relation between test results and a dichoto¬ 
mized criterion, such as ratings of success and failure, good and poor, etc. 
(cf. Chapter 17, Section C). The usefulness of biserial r in this latter type of 
situation is illustrated by Table 10:2, showing the relationship between a 
clerical proficiency test and ratings of 133 clerical employees in the relevant 
skill. 

When the problem in psychometrics is to determine which items of a test 
yield the best and which the poorest results, a coefficient of correlation is 
useful as an index of the differentiating value of single items. A valuable test 
item is one that is answered in such a way that those doing well, either on 
the test as a whole or on an independent criterion of efficiency, generally 
respond in the same way to the item, whereas those doing poorly on the test 
as a whole or on the independent criterion of efficiency, generally give a con¬ 
trasting response. An item of little or no value is one that produces no differ¬ 
entiation with respect to the criteria used. By means of an empirical analysis 
of the items on a preliminary or tentative test, the investigator can establish 
a final test consisting only of items whose usefulness has been functionally 
demonstrated. 

Besponses to test items are often dichotomized initially; that is, the answers 
are scored as right or wrong, correct or incorrect. Variables that are quantita¬ 
tively distributed are also sometimes dichotomized for purposes of correlation 
analysis into “satisfactory” and “unsatisfactory,” “success” and “failure,” 
etc. Thus, the correlation between the aptitude scores and the achievement 
ratings of the 20 subjects in Table 10:1 might be recast and obtained by the 
method of biserial correlation. In fact, the latter would ordinarily be used 
for validation of aptitude test scores when the group included many more 
individuals and it would consequently be relatively difficult to assign different 



260 SPECIAL METHODS FOR THE UNEAR CORRELATION OF VARIABLES 


achievement ratings to all the members. In such a situation, each member of 
the group could be given a criterion rating of his performance in terms of 
“satisfactory” or “unsatisfactory.” 


Computation of Biserial r 

We shall illustrate the computation of biserial r in several different situa¬ 
tions for which this method is characteristically valuable. First, we shall ap¬ 
ply it in evaluating the usefulness of an aptitude test for predicting success 
or failure in a clerical situation. We shall then use the method to evaluate an 
item of a vocabulary test against the total test score. This constitutes an 
example of internal validation, inasmuch as the differentiating value of the 
item is analyzed with respect to the total test of which it is a part, rather than 
with respect to an external criterion of success or failure. The internal valida¬ 
tion of test items is common in psychometrics, but unfortunately the pro¬ 
cedure is not as sound as validation against an independent, external criterion 
of efSciency (cf. Chapter 17, Section D). 

The basic formula for biserial r is as follows: 


where 



P0:4] 

Biserial coefficient for 
linear correlation 


Mh is the mean of the distribution of test scores for the part of the total 
group receiving the higher criterion rating. 

Ml is the mean score for the remainder of the total group (lower rating part). 
ot is the standard deviation of the test distribution for the total group. 

Ph is the proportion of the total group in the higher criterion group, 
q is the proportion of the total group in the bwer criterion group. 
y is the value of the ordinate on a normal curve at the point that divides 
the total distribution into two parts, with the proportion of tlie area above the 
point equal to ph- (The value of the ordinate at such a point can be obtained 
from Table 10:3, which gives these values for a normal distribution in which 
the total area is taken as unity.) 

A more convenient modification of this formula makes unnecessary the 
computation of the mean test score result for the group with the lower criterion 
ratings. It is as follows: * 

(Mh-M\(pk\ _ P0:4a] 

— I Jl j Dunlap’s formula for 

' * /\^/ biserial correlation 


The symbols have the same meaning as in Formula 10:4, except that Mthe 
mean of the total distribution, is employed instead of Mi, and ph is used instead 
of phq> Formula 10:4a is used in the following examples. 

* J. W. Dunlap, “Note on Computation of Biserial Correlations in Item Evaluation,” 
Paychometrika, 1:51-60,1936. Dunlap’s article includes a table of p/y values to four decimal 
places for p values of .000 to .999. In the same issue of Psychometrika Dunlap also presents 
a nomograph for the computation of biserial correlations. 



SERIAL CORRELATION 


261 


The computations needed, therefore, in calculating biserial r by this latter 
method are those that will yield the mean and standard deviation of the test 
scores of the total group, and the mean of the test scores for the part of the 
total group that receives the higher ratings. The short method has been used 
in computing M and <t in Tables 10:2 and 10:5. Whatever method is used for 
computing these two measures, the distributions used must be set up in the 
same way as those obtained from cross-tabulating the continuous and dichoto¬ 
mous variables. 

Biserial Correlalion of Clerical Proficiency Test Results with Independent 
Ratings of Efficiency 

A clerical proficiency test, consisting of 53 items of an information type 
and designed to measure proficiency (not aptitude, or potential skill), was 
developed for use in the classification and selection of clerical workers whose 
chief task would be the preparation of teclmical correspondence in proper 
form.* The test was administered to a group of 133 employed stenographers, 
typists, and clerks, each of whom was independently rated on the following 
3-point scale for efficiency in clerical work: 

1. NO skill: For employees with no training who were judged to have had no 
significant experience in the technical correspondence under consideration. 

2. SOME skill: For employees who had had appreciable training or experi¬ 
ence, or both, but were not judged competent in the technical corre¬ 
spondence at the time the test was administered. 

3. skilled: For employees judged fully competent in the technical corre¬ 
spondence, a competent person being defined as one who may be charged 
with responsibility for the preparation of the correspondence in proper 
form, from a copy or notes or instructions which provide only the sub¬ 
stance of such communications, with no review for form prior to the 
preparation of final copy. 

All ratings were obtained by the psychologist in conference with the super¬ 
visors of the 133 subjects. The ratings were as follows: 33 of the total group 
were rated as not skilled, 37 as having some skill, and 63 as skilled. 

It will be observed that the independent criterion ratings for this group 
were classified into three rather than two classes. A 3-point rating scale is 
often easier and more desirable than a 2-point scale for such purposes. A 
triserial coefficient for computing a linear correlation between a scale of 
trichotomized ratings and a distribution of test scores will be presented later 
in this section. Biserial r can, however, be employed with the clerical proficiency 
test data if the ratings of two adjacent classes are combined; but it cannot 
be employed with only the upper and lower parts of a distribution, as, for 
example, the skilled vs. the not skilled, the some skilled being omitted. 

* Data through courtesy of E. E. Cureton, of Richardson, Bellows, Henry and Co., Inc.. 
New York City. 



262 SPECIAL METHODS FOR THE UNEAR CORRELATION OF VARIABLES 

This is so because biserial r is based on the assumption that the dichotomy is 
for a continuous normal distribution. Since both the not skilled and the 
SOME SKILLED gToups may be defined as not skilled, they will be combined, 
and the result will be a dichotomy of the skilled and the not skilled. 
This dichotomy is used in Table 10:2, which lists the test scores made by the 
total group as as by each dichotomized part, and illustrates the com¬ 
putation of biserial r for the results. 


Table 10:2. Biserial Correlation of Clerical Proficiency Test Results 
with Criterion Ratings 


(1) 

Tost 

Scores 

(2) 

Criterion 

Some Skill 
and No 
Skill 

fl 

(3) 

1 Ratings 

Skilled 

l 

(4) 

Total 

Group 

ft 

(5) 

t' 

(6) 

ft' 

(7) 

ft'» 

1 

(9) 

ft' 

51-52 



1 

9 

9 

81 

6 

6 

49-50 



6 

8 

48 

384 

5 

25 

47-48 



7 

7 

49 

343 

4 

28 

45-46 


8 

11 

6 

66 

396 

3 

24 

43-44 


4 

6 

5 

30 

150 

2 

8 

41-42 


8 

10 

4 

40 

160 

1 

8 

39-40 



10 

3 

30 

90 

0 

0 

37-38 



13 

2 

26 

52 

-1 

-7 

35-36 



9 

1 


9 

-2 

-4 

33-34 



9 



0 

-3 

-9 

31-32 



6 



6 

-4 

-8 

29-30 

8 


12 



48 

-5 

-20 

27-28 


1 




63 

-6 

-6 

25-26 


2 




176 

-7 

-14 

23-24 



HB 

-5 


75 



21-22 

■1 





36 



19-20 




-7 

-42 

294 



17-18 




-8 

-32 

256 



15-16 



I^^bI 

-9 

-9 

81 




Ni ^ 70 

Nfc = 63 

Nt = 133 


307 

2700 


99 






-199 



-68 






S = 108 



31 


Mt = 33.5 + 2(108/133) = 35.12 


(Tt = 2 V 27 OO/I 33 - (108/133)* == 8.86 
Mh “ 39.5 + 2(31/63) * 40.48 
Ph = 63/133 = .47 


Y » .398 (from Tablo 10:3, a == .03) 

/40.48- 35.12V .47 \ ... 

“ I- m - }{l9Bj = = -72 


The scores obtained from the 53-item clerical test are grouped in class 
intervals of two units in column 1 of this table. The distribution of frequencies 





























SERIAL CORRELATION 


263 


for the test results for all 133 cases, the total group, is given in column 4. 
The cross-tabulation of each dichotomized part with the test results is shown 
in columns 2 and 3. The test results for the not skilled (the some skilled 
combined with the not skilled) are listed in column 2 and include 70 cases. 
This part of the total group, the lower part, is symbolized by L The test 
results for the 63 in the skilled group, the higher part, are given in column 3 
and symbolized by h. It is apparent at once that the average test results for 
the skilled group are somewhat higher than the results for the not-skilled 
group, which of course indicates a positive correlation between these results 
and the criterion ratings. The problem now is to determine the degree of this 
correlation. 

The initial figures for the mean and standard deviation of the test results 
for the total group are indicated in columns 5, 6, and 7, and the final com¬ 
putations are shown at the bottom of the table. Mt is equal to 35.12 and at 
is equal to 8.86. 

The initial figures for the mean test score of the part of the total group with 
the higher ratings are given in columns 8 and 9, and, as indicated at the bot¬ 
tom of the table, Mh is equal to 40.48. 

There remains only to determine the values of ph and y. These computa¬ 
tions are shown at the bottom of the table, and the two values are .47 and 
.398 respectively. 

The correlation coefficient is now computed and is found to be .72. This 
then is an index of the degree of relationship between the group’s results on 
the test of clerical proficiency and the independent criterion ratings of their 
efficiency or skill for this type of clerical work. Such a coefficient for this 
type of correlation is called a validity coefficient of a test. It is the index 
of the validity of this test in differentiating the skilled and not-skilled clerical 
workers for technical correspondence of the type under study. The coefficient, 
.72, is satisfactory for an achievement or proficiency type of test. It shows 
that this test should be of considerable aid in the immediate differentiation 
of SKILLED and not-skilled applicants for such positions; the test should 
differentiate those who will need training from those who can go right to 
work. Only in the case of perfect correlation, however, would it be possible 
to make such a differentiation with no errors. Thus columns 2 and 3 in 
Table 10:2 show that a few of the not-skilled group made fairly high scores 
on the test; 9 out of the 70 had scores greater than 38. However, two-thirds 
of the SKILLED group scored above 38. None of this group had scores below 
25, whereas more than 20% of the not-skilled group scored less than this. 
Between the test scores of 25 and 38 there is considerable overlapping; con¬ 
sequently, in using the test, those who received scores within this range are 
better classified as doubtful rather than as skilled or unskilled. 

Before proceeding with the discussion of biserial r, we wish to emphasize 
that if there is little or no difference between the means of the distributed 
variable of the dichotomized parts of the whole (or between the means of 



264 SPECIAL METHODS FOR THE UNEAR CORRELATION OF VARIABLES 


either part and the whole), there is obviously no basis for correlation. Many 
problems for which a biserial r could be calculated are analyzed by comparing 
the means of the two groups. If the difference between the means is zero or 
insignificant (see Chapter 14, Sections E and F, for the implications of a 
significant difference between means in sampling theory), it follows that the 
correlation is zero and does not need to be calculated. On the other hand, 
when, as in the preceding example, the difference between the means is con¬ 
siderable, there is a basis for some degree of correlation, and biserial r is a 
convenient coefficient for indexing the degree of the relationship. More 
information can usually be obtained from biserial r than from only the means 
of the dichotomized parts of a distribution. 

The Ordinate Values of Table 10:3 

As indicated at the bottom of Table 10:2, the value for y in computing 
biserial r was obtained from the ordinate values in Table 10:3. This table 

Table 10:3. Deviates (x/<t) in Terms of <r Units and Ordinates (y) for 
Given Areas Measured from the Mean of a Normal Distribution Whose 
Total Area = 1.00 * 


Area from 
the Mean 

x/a 

Ordinates 

Area from 
the Mean 

x/a 

Ordinates 

a 

z 

y 

a 

z 

y 

.00 

.000 

.399 




.01 


.399 

.26 

.706 

.311 

-02 

,050 

.398 

.27 

.739 

.304 

.03 

,075 

.398 

.28 

.772 

.296 

.04 

.100 

.397 

.29 

.806 

.288 

.05 

.126 

.396 

.30 

.842 

.280 

.06 

.151 

,394 

.31 

.878 

.271 

.07 

.176 

.393 

.32 

.915 

.262 

.08 

.202 

.391 

.33 

.954 

.253 

.09 

.228 

.389 

.34 

.995 

.243 

.10 

.253 

.386 

.35 

1.036 

.233 

.11 

.279 

.384 

.36 

1.080 

.223 

.12 

.305 

.381 

.37 

1.126 

.212 

.13 

.332 

.378 

.38 

1.175 

.200 

.14 

.358 

.374 

.39 

1.227 

.188 

.15 

.385 

.370 

.40 

1.282 

.176 

.16 

.412 

.366 

.41 

Bn 

.162 

.17 

.440 

.362 

.42 

Bn 

.149 

.18 

.468 

.358 

.43 

Bn 

.134 

.19 

.496 

.353 

.44 


.119 

.20 

.524 

.348 

.45 

wSm 

.103 

.21 

.553 

.342 

.46 

1.751 

.086 

.22 

.583 

.337 

.47 

1.881 

.068 

.23 

.613 

.331 

.48 

2.054 

.048 

.24 

.643 

.324 

.49 

2.326 

.027 

.25 

.675 

.318 

.50 

00 

.000 


See also Table I. Appendix B. 

























266 SPECIAL METHODS FOR THE LINEAR CORRELATION OF VARIABLES 


gives both the z score distance from the mean (second column) and the 
ordinate values, y (third column), at any point for areas, (a), of the normal 
distribution taken with respect to the mean (first column). The area is given 
in terms of proportions of the area from the mean, beginning at the mean 
itself, where (a) is equal to .00, and ranging to the theoretical limit of a 
proportion of ,50, above or below the mean. This latter distance is infinite 
from the mean and the ordinate value is zero. 

The total area of the normal distribution is taken as equal to 1.00. The 
ordinate value, y, at the mean of such a distribution is equal to .399. In 
Table 10:2, and also in A of Fig. 10:1, .47 of the total group was in the higher 
part; consequently, we need to locate a point above the mean that divides 
the total distribution into two parts, viz., .47 and .53. This will be an (a) value 
of .03 (see Table 10:3), since .50 — .47 = .03. Reading across the tal3le for 
this value of (a) gives the value of y, the ordinate, as .398. 

If, as in Table 10:5, the proportionate size of the higher group is greater 
than .50, the point on the curve that divides the total group into its two 
parts is below the mean of the distribution. When p is equal to .64, as in JB 
in Fig. 10:1, the ordinate value, y, is at a point below the mean that includes 
.14 of the area, since .50 of the area lies above the mean. In Table 10:3, this 
will be an (a) value of .14. Reading across the table for this value, we see 
that y is equal to .374. 


Values for Biserial x from Table 10:i 

The computations required for the final term of Formulas 10:4 and 10:4a 
for biserial r can be facilitated by referring to Table 10:4. The proportionate 
values of the frequencies for each dichotomized part in biserial correlation 
are given in the first two columns headed p and q respectively. The product 
of p and q in column 3 is useful for a number of problems but is not needed 
for Formulas 10:4 and 10:4a, since the ratio, pq/y, needed for Formula 10:4 is 
given in colunm 5, and the ratio p/y, needed in Formula 10:4a—the one 
used for Table 10:2—is given in column 4. 

The value for p, the proportionate number of frequencies in the higher 
group in Table 10:2, was .47. Locating this value near the bottom of the 
first colunm in Table 10:4 and reading across the row to column 4, we see 
that the ratio of p/y is equal to 1.1815. This checks with the value obtained 
in Table 10:2, where the ratio of ph to y was computed. Columns 6 and 7 of 
Table 10:4 are useful in computing point-biserial r, described in the next 
section. 

It will be observed that the values for Table 10:4 run only to p = .50. If 
the proportionate size of the higher group in a biserial correlation is greater 
than .50, the table may still be used by reversing the p and q values, or by 
employing Formula 10:4 for biserial r. If the p and q values are reversed. 



Table 10:4. Values Employed in the Determination of Biserial and 
Point-Biserial Correlations * 


0 ) 

p 

( 2 ) 

q 

( 3 ) 

pq 

H 

( 5 ) 

pq 

y 

( 6 ) 

Vpq 

( 7 ) 

^q 

.01 

.99 

.0099 

.3745 

.3700 

.0994 

.1005 


.98 

.0196 

.4132 

.3935 

.1380 

.1428 

.03 

.97 

.0291 

.4412 

.4264 

.1703 

.1758 

.04 

.96 

.0384 

.4640 

.4452 

.1959 

.2042 

.05 

.95 

.0475 

.4850 

.4605 

.2179 

.2293 

.06 

.94 

.0564 

.5038 

.4736 

.2375 

.2526 

.07 

.93 

.0651 

.5212 

.4844 

.2551 

.2744 

.08 

.92 

.0736 

.5380 

.4950 

.2713 

.2950 

.09 

.91 

.0819 

.5542 

.5044 

.2862 

.3145 

.10 

.90 

.0900 

.5698 

.5129 

.3000 

.3333 

.11 

.89 

.0979 

.5851 

.5207 

.3129 

.3416 

.12 

.88 

.1056 

.6000 

.5278 

.3249 

.3693 

.13 

.87 

.1131 

.6147 

.5347 

.3363 

.3865 

.14 

.86 

.1204 

.6289 

.5410 

.3470 

.4035 

.15 

.85 

•1275 

.6432 

.5469 

.3571 

.4201 

.16 

.84 

.1344 

.6576 

.5523 

.3666 

.4365 

.17 

.83 

.1411 

.6717 

.5574 

.3756 

.4525 

.18 

.82 

.1476 

.6860 

.5627 

.3842 

.4685 

.19 

.81 

.1539 

.7001 

.5670 

.3923 

.4844 

.20 

.80 

.1600 

.7143 

.5714 

.4000 

.5000 

.21 

.79 

.1659 

.7287 

.5758 

.4073 

.5156 

.22 

.78 

.1716 

.7430 

.5793 

.4142 

.5311 

.23 

.77 

.1771 

.7576 

.5832 

.4208 

.5465 

.24 

.76 

.1824 

.7720 

.5868 

.4271 

.5620 

.25 

.75 

.1875 

.7867 , 

.5900 

.4330 

.5773 

.26 

.74 

.1924 

.8015 

.5929 

.4386 

.5928 

.27 

.73 

.1971 

.8167 

.5960 

.4439 

.6082 

.28 

.72 

.2016 

.8318 

.5989 

.4490 

.6236 

.29 

.71 

.2059 

.8472 

.6016 

.4538 

.6391 

.30 

.70 

.2100 

.8628 

.6037 

.4582 

.6547 

.31 

.69 

.2139 

.8787 

.6062 

.4625 

.6703 

.32 

.68 

.2176 

.8949 

.6086 

.4665 

.6860 

.33 

.67 

.2211 

.9114 

.6107 

.4702 

.7018 

.34 

.66 

.2244 

.9279 

.6125 

.4737 

.7178 

.35 

.65 

.2275 

.9449 

.6143 

.4770 

.7338 

.36 

.64 

.2304 

.9623 

.6159 

.4800 

.7500 

.37 

.63 

.2331 

.9799 

.6173 

.4828 

.7664 

.38 

.62 

.2356 

.9979 

.6187 

.4854 

.7829 

.39 

.61 

.2379 

1.0164 

.6200 

.4877 

.7996 

.40 

.60 

.2400 

1.0355 1 

.6214 

.4899 

.8165 

.41 

.59 

.2419 

1.0548 

.6222 

.4918 

.8336 

.42 

.58 

.2436 

1.0744 1 

.6230 

.4935 

.8509 

.43 

.57 

.2451 

1.0947 

.6241 

.4951 

.8686 

.44 

.56 

.2464 

1.1156 

.6247 

.4964 

.8864 

.45 

.55 

.2475 

1.1369 

.6254 

.4975 

.9045 

.46 

.54 

.2484 

1.1590 

.6258 

.4984 

.9230 

.47 

.53 

.2491 

1.1815 

.6262 

.4991 

.9417 

.48 

.52 

.2496 

1.2048 

.6265 

.4996 

.9508 

.49 

.51 

.2499 

1.2287 

.6266 

.4999 

.9802 

.50 

.50 

.2500 

1.2534 

.6266 

.5000 

1.0000 


* This table was developed by E. K. Taylor of the Adjutant Generars Office, and is 
reproduced by permission. 


267 











268 SPECIAL METHODS FOR THE UNEAR CORRELATION OF VARIABLES 

Formula 10:4a must be changed, and the meeui of the lower part must be 
substituted for the mean of the higher part, as follows: 

[10:4b] 

( Mt — Biserial correlation 

-)(~) changed for use with 

/\y / Table 10:3 when 

p > .50 

The use of this formula will be illustrated in the following section. 

Biserial Correlation of a Vocabulary Test Item with the Total Test Score 

A vocabulary test * consisting of 80 multiple-choice items and designed 
to measure general vocabulary knowledge at a high level of difficulty was 
administered to 181 college students. Each test item was scored as correct 
or incorrect. The “best” items will be those that most consistently differ¬ 
entiate students with good vocabulary ability from students with only poor 
vocabulary ability. However, no really independent ratings of tliis ability 
are usually available for evaluating the validity of vocabulary test items, 
and consequently they are often validated internally, that is, against the 
total test score taken as the criterion of the ability. If then the total score on 
the vocabulary test is assumed to measure vocabulary ability, its use as a 
criterion will permit an item analysis of the test from which the relative 
adequacy of each item can be determined. The items that correlate highest 
with the criterion will be “best,” and those that correlate least with the 
criterion will be “poorest.” 

In a sense, this problem reverses the position of the criterion in the biserial 
correlation in Table 10:2, for this time the variable is the criterion. The 
dichotomy is the result obtained for one test item, scored as correct or incorrect, 
as follows: 

Test Item: 

__ 5_ basal: 1. mean 2. malcontent 3. sly 4. justifiable 5. essential 

The data for the 181 students’ results are cross-tabulated for biserial corre¬ 
lation in columns 2 and 3 of Table 10:5. There is considerable overlapping 
in the total test scores of those who responded correctly (column 3) and those 
who responded incorrectly (column 2) to the particular item. However, a 
somewhat greater proportion of those with higher total test scores did answer 
the item correctly. Thus, the 4 students with a total test score of 70 or better 
answered the item correctly; 7 out of 8 with a total test score of 67 to 69 
answered it correctly, etc. Some degree of correlation is observable from an 
inspection of the results, and it is positive. The means of both the lower and 
the higher groups are computed in Table 10:5. The biserial coefficient is found 
to be .55 by Formulas 10:4, 10:4a, or 10:4b, as follows: 

* A special vocabulary test administered by the author to students in general psychology 
classes at the College of the City of New York. 




SERIAL CORRELATION 


269 


Table 10:5. Biserial Correlation of a Vocabulary Test Item with the Total 

Test Score 


(1) 

Total Test 
Scores 

(2) 

Response 

Incorrect 

h 

(3) 

s to Item 

Correct 

' h 

(4) 

Total 

Group 

ft 

(5) 

(6) 

tt' 

(7) 

(8) 

f/' 

(9) 

fh' 

70-72 


4 

4 

7 

28 

196 


28 

67-69 

1 

7 

8 

6 

48 

288 

6 

42 

64-66 

2 

7 

9 

5 

45 

225 

10 

35 

61-63 

5 

14 

19 

4 

76 

304 

20 

56 

58-60 

3 

16 

19 

3 

57 

171 

9 

48 

55-57 

6 

13 

19 

2 

38 

76 

12 

26 

52-54 

5 

16 

21 

1 

21 

21 

5 

16 

49-51 

3 

18 

21 

0 

0 

0 

0 


46-48 

7 

8 

15 

-1 

-15 

15 

-7 

-8 

43-45 

7 

6 

13 

-2 

-26 

52 

-14 

-12 

40-42 

8 

2 

10 

-3 

-30 

90 

-24 

-6 

37-39 

5 

3 

8 

-4 

-32 

128 

-20 

-12 

34-36 

8 

2 

10 

-5 

-50 

250 

-40 

-10 

31-33 

1 

0 

1 

-6 

-6 

36 

-6 


28-30 

4 

0 

4 

-7 

-28 

196 

-28 



2 

II 

o. 

Oi 

Nh = 116 

Nt = 181 


313 


62 

251 






-187 

■IM 

-139 

-48 






S == 126 

■1 

-77 

203 


tAt = 50.0 -f 3(126/181) = 52.09 
Gt = 3V2048/I8I - (126/181)2 = 9.87 
M/. = 50.0 + 3(203/116) = 55.25 
Mz = 50.0 + 3(-77/65) = 46.45 
Ph = 116/181 = .641 
Y = .374 (From Table 10:3, a = .14) 
pq/y = .616 (From Table 10:4, where pn •$ taken a$ q = .64) 
pi/y = .962 (From Table 10:4, pi = .36) 


By Formula 10:4: 

the value for the ratio pq/y, .616, being obtained from column 5 of Table 10:4. 


By Formula 10:4a: 


rhi = 



= (.320)(1.714) = .55 






























270 SPECIAL METHODS FOR THE UNEAR CORRELATION OF VARIABLES 


The value of the ratio must be computed; the value of y, .374, is obtained 
from Table 10:3. 

By Formula 10:4b: 



= (.571) (.962) = .55 


the value of the ratio p//y, .962, being obtained from column 4 of Table 10:4. 

A test item correlation of .55 with a criterion, whether internal (as in this 
case) or external and independent, is fairly satisfactory in that such an item 
does result in some differentiation between those with more ability and those 
with less ability in the attribute imder consideration. If the vocabulary test 
were to be shortened to the best 50 items, this is one that might be retained. 
It is unlikely that there would be 50 or more other items that would yield a 
higher correlation with the criterion. Thus, by means of biserial r, a validity 
coefficient can be obtained for each item of a test, and the most effective 
items can be selected for any abbreviation or revision of the test itself. 


Point*Biserial Correlation 

We saw that the preceding method for the biserial correlation of one 
continuously distributed variable with a dichotomized variable was based 
on the assumption not only that the relationship is linear but also that the 
dichotomized variable is in reality a quality or attribute that would yield 
a normal, bell-shaped distribution if it could be measured on a continuous 
scale. This latter assumption is the most reasonable one that can be made 
for the dichotomies in the examples in Tables 10:2 and 10:5, even though 
the ratings of skill in technical correspondence are admittedly not on a 
continuous scale, and it may be difficult to see how responses to a test item 
can ever yield a normal distribution, since such responses are usually scored 
as correct or incorrect. Nevertheless, normality is usually assumed for test 
items provided the test itself, as a whole, yields a normal distribution of test 
results. This assumption is based on the argument that the form of the whole 
is derived from the form of each of its parts. From this it follows that if the 
whole test yields a distribution of the normal type, each item in it would 
necessarily yield a normal distribution if the responses to it could be differ¬ 
entiated on a sufficiently fine scale. 

Situations often arise, however, in which there is no logical justification 
for assuming the normality of the dichotomized attribute in biserial correla¬ 
tion. One of the attributes may be in the form of a true dichotomy, as was true 
of some of those described in Chapters 2-4—for example, the dichotomy 
male and female. Although sex differences in results for a continuously dis¬ 
tributed variable, such as intelligence test scores, are usually analyzed by 



SERIAL CORRELATION 


271 


comparing the means of each sex group, we have already indicated that if 
there is a mean diflFerence, some degree of correlation exists between the 
dichotomy and the variable. The smaller this difference, the less the d^ee 
of correlation; and conversely, the greater the difference, the greater the 
degree of correlation. 

Questionnaire results in market research investigations often yield true 
dichotomies. Thus, the following questions of fact will be answered by Yes 
or No: Do you own an automobile? Do you own a piano? Do you have any 
brothers or sisters now living? Have you ever traveled in an airplane? Do 
you reside in San Francisco? The correlation of answers to such questions 
with the age, income status, education, intelligence, etc., of the respondents 
often gives insights into the relationships between a character of a population 
and a product. The classification of adults into the married and not married, 
PARENTS and NOT PARENTS, also represents true dichotomies. 

Fortunately, there is available a method of correlation that can be employed 
to correlate two attributes, one of which is continuously distributed, the 
other being a true dichotomy, or a dichotomized variable that is not normally 
distributed. The assumption is made that a linear function is adequate to 
describe the relationship. As in biserial r, the continuous variable is not 
assumed to be normally distributed. The coefficient obtained is called point- 
biserial r and is equal to the following: * 


rpt-bi — 



[10:5] 
Point-biserial r 


The symbols here have the same meaning as those in Formula 10:4, except 
that for a true dichotomy “higher” and “lower” have no meaning, and 
hence one part of the dichotomy is symbolized as P and the other part as Q, 
As Formula 10:4 was simplified to Formula 10:4a and 10:4b for biserial r, 
point-biserial r may also be obtained from measures derived from the total 
distribution and only one of the dichotomized parts, as follows: 


r pt-bi — 


f Mp - M t 

I 


[10:5a] 

Point-biserial r 


The formula for point-biserial r is thus not very different from that for 
biserial r. The first ratio is identical in both Formulas 10:4 and 10:5, but the 
second ratio is different. Thus, in Formula 10:4, this ratio is based on the 
ordinate value y of the normal, bell-shaped curve at the point that divides 
the area of the distribution into two parts corresponding to the proportions 
of the total group in the respective divisions of the dichotomy. In For¬ 
mula 10:5, however, the second ratio is based on the proportionate parts of 
the dichotomized whole, p and q. 


* M. W. Richardson and J. M. Stalnaker, “A Note on the Use of Biserial r in Test 
Research,” Journal of General Psychology, 8:463, 1933. 



272 SPECIAL METHODS FOR THE UNEAR CORRELATION OF VARIABLES 


It should be emphasized that point-biserial r is written without a sign 
when the dichotomy is a true one and “higher” and “lower” therefore have 
no meaning. Under such circumstances, which part of the dichotomy is called 
P and which part ^ is a purely arbitrary decision. If the value of the mean of 
part P of the variable distribution proves to be less than this value for the 
total distribution, the result will presumably be negative because of the 
order of Mp and Mr in Formula 10:5a. In such cases the sign is ignored and 
the result is interpreted by verbalizing the way in which the dichotomized 
attribute is related to the variable attribute. 


Triserial, Quadriserial, and Quintiserial r 

Jaspen * has recently developed formulas for serial correlation which are 
useful for determining the linear correlation between a continuously dis¬ 
tributed variable and a variable segmented or classified into a few broad 
classes. Although he gives the formula for serial correlation in general, in¬ 
cluding biserial r, we shall present only his formulas for: 

1. Triserial r, for the linear correlation of a continuously distributed 
variable with a trichotomized variable. 

2. Quadriserial r, for the linear correlation of a continuously distributed 
variable with a variable segmented into four classes. 

3. Quintiserial /•, for the linear correlation of a continuously distributed 
variable with a variable segmented into five classes. 

As in the case of biserial r, each of these coefficients is based on the 
assumption that the data of the segmented variable are derived from a 
variable that is normally distributed. We shall explain the computation of 
triserial r, using Cureton’s data for the clerical proficiency test (dichotomized 
in Table 10:2), and then for reference present Jaspen’s formulas for quad¬ 
riserial and quintiserial r. 


Triserial r 

The procedure for computing triserial r is similar to that used for biserial r 
in that the means of each segmented class of the continuously distributed 
variable are compared. The greater the difference between these means, the 
greater the correlation; the less the difference, the less the correlation. The 
formula for triserial r is as follows: 


rtri — 


yhMn + 0 » — yh)Mc - yMi 

[yl ^ (y. - ^ yi"! 

Ph Pc PlJ 




[ 10 : 6 ] 
Triserial r 


* Nathan Jaspen, “Serial Correlation,” Psyckomelrika, 11:2.3-30, 1946. I wish to thank 
the author, and Dr. H. O. Gulliksen, the editor of Psychometrikay for letting me see this 
article prior to publication and giving me permission to incorporate the formulas in this 
chapter. 1 have changed some of the symbols in Jaspen’s formulas to make them uniform 
with those used in this book. 



SERIAL CORRELATION 


273 


where 

/, c, and h symbolize the three classes of the segmented variable, i.e., 
I = the lowest part, c = the central (or middle) part, and h = the highest 
part. 

Ml, Mc 9 and Mh are the respective means of each of the segmented groups 
of the continuously distributed variable. 

<Tt is the standard deviation of the continuously distributed variable. 

Pu Pc and ph are the respective proportions of the total group in each of 
the three segmented classes. 

yi, yc and yh are the respective values of the ordinates on a normal curve 
at points that divide the total distribution into three parts, with the propor¬ 
tion of the area above the upper point of division equal to ph, and the pro¬ 
portion of the area between the lower and upper point of division equal to 
Pc, (The ordinate value for pi is always zero because this “point” is at the 
lowest end of the distribution.) 

The computation of triserial r is illustrated in Table 10:6. The scores for the 
continuously distributed clerical proficiency test results are listed in the first 
column as they were for the computation of biserial r in Table 10:2. The 
trichotomized criterion ratings are given in columns 2, 3, and 4. The test 
results for persons with no skill ratings are distributed in column 2; for 
those with ratings of some skill, in column 3; and for the skilled, in 
column 4. The distribution for the total group is given in column 5. The mean 
for the lowest group (no skill) is obtained by the short method (see Table 7:6) 
and shown in columns 6 and 7. Similarly, the mean for the central (or middle) 
group (some skill) is shown in columns 8 and 9, and for the highest group 
(skilled) in columns 10 and 11. The mean and standard deviation of the 
total group are obtained by the short method (see Table 7:10) and shown 
in columns 12, 13, and 14. The ordinate values for y at the bottom of the 
table are obtained from Table 10:3 (or from Table I, Appendix B). 

The value of triserial r for the correlation between the criterion ratings 
and the clerical proficiency test results is .71, which practically coincides with 
the value of biserial r for these data. In this case therefore, triserial r did not 
give a result any different from that obtained by biserial r; the mean differ¬ 
ence between the no skill and the some skill subgroups is not sufficiently 
great to make a difference in the value of r. 


Quadriserial r 

Jaspen’s formula for quadriserial r is as follows: 


r quad — 


ykMh + (ya — yh)Md + (yb — yd)Mi, 


<rt 




>** , (yd 


-y*)^ I (yi-yd^ 

Pd Pb p 


- ybMi 

"■] 


[10,7] 


Quadriserial r 


where /, b, d, and h represent the four parts of the segmented variable: I the 
lowest, b the next lowest, d the next highest, and h the highest part. M, y. 



Table 10:6. Triserial G>rrelation of Clerical Profkiency Test Results with Criterion Ratings Trichotomized 


. I 



. I 


5 ll 

h-o 



NO»ocm<n 

ro eo CO r- | | 


N. ^ «0 CO CM — 



00 O 00 o 

O 00 to o 

CO 00 CM CM O 

' 'TT 

O *- 00 

T' 

00 N 'O V) 

CO <N O 

CM CO »0 

Mill 

*0 K 00 

1 1 1 


N ^ O 00 a* 

CM O O 'O 

CO CM lO 



1 1 

'TT' 



K O »0CO CS •-O ^ CS CO ^ «o s> 

II I I I I 


O O CO *0 <N K •- CO •-* 


> 00 K CM CO CN O) 


»~0'^«OCO CO^'^COO OCO»- 


C4 O 00 *o 

uu 


^coco»“CO i/)ir)io 
^ CO ^ 


CN o 00 «o 'tr cs o 00 ^ cn o oo « 

iizu nzn zzi: 


II II II II II II 









































TETRACHORIC CORRELATION 


275 


and p are the means, ordinates, and proportions, as in Formula 10:6, and at 
is the standard deviation of the total distribution of the continuously dis¬ 
tributed variable. 

Quintiserial r 

Jaspen’s formula for quintiserial r is as follows: 

rquint = -f CVrf - yh)Md -f CVc — yd)M^ + (yt - yc)Mb — ybMt ^ g-. 

y*y + ,(y> - y^y + Qumiiserial r 

LPA Pd Pe Pb , PlJ 

where Z, 6, c, d, and h represent the live parts of the segmented variable; 
I the lowest, b the next lowest, c the central or middle, d the next highest, 
and h the highest. Af, y, and p are the means, ordinates, and proportions as 
in Formula 10:6, and at is the standard deviation of the total distribution of 
the continuously distributed variable. 

C. TETRACHORIC CORRELATION 
Purpose of the Method 

A measure of linear correlation for the cross-tabulation of the data of 
dichotomized variables that are normally distributed is provided by the 
tetrachoric correlation coefficient. However, without the diagrams prepared 
by L. L. Thurstone and his associates * for determining the coefficient, the 
method is too arduous from the computational point of view for any prac¬ 
tical use. The complete equation for rt involves a series with many powers 
of rt. The method is valuable in two types of situations in which measures of 
the degree of linear correlation are needed, and fortunately, Thurstone’s 
diagrams make it easy to use. 

Both situations concern bi-variates. The first situation involves bi-variates, 
either or both of which yield dichotomized data, or coarse groupings of results 
into a few classes that can be readily combined to make a dichotomy. These 
are the kinds of data characteristic of many market research investigations 
and of the social psychologists’ studies of attitudes, preferences, etc. In the 
second situation, all the bi-variates yield continuous distributions of quanti¬ 
tative data, but a labor-saving device is required when many .correlations are 
to be computed. Such occasions arise whenever all the inter-correlations be¬ 
tween five or more variables must be obtained, or a series of test items must 
be correlated with a dichotomized criterion. For example, in a correlational 
analysis based on 10 variables, 45 inter-correlation coefficients must be com¬ 
puted (cf. Chapter 18). If each of these 10 variables is dichotomized near 
the median or mean of its distribution and the data are cross-tabulated into 

* L. Cheshire, M. Saffir, L. L. Thurstone, Computing Diagrams for the Tetrachoric Correia- 
tion Coefficient^ Univ, of Chicago Bookstore, Chicago, 1933. 



276 SPECIAL METHODS FOR THE LINEAR CORRELATION OF VARIABLES 


2 by 2 (fourfold) tables, the coefficients can be readily obtained with Thurs- 
tone’s diagrams. Also, in the item analysis of a test, n correlation coefficients 
will be needed; thus in a test of 100 items, 100 coefficients must be computed. 

The Computation of Tetrachoric r (rj 

If the data to be correlated are already dichotomized, the first step in com¬ 
puting rt is the cross-tabulation, in a fourfold table, of the two variables to 
be correlated. If, on the other hand, rt is being used as a labor-saving device 
in estimating r for two continuously distributed variables, the first step is to 
dichotomize each variable near its median or mean so as to put the data 
in a form suitable for cross-tabulation in a 2 by 2 correlation matrix. 

We shall first discuss the computation of rt in this latter situation, using 
the data in Table 9:1. The product-moment correlation of these data gave a 
coefficient of .67 for the relationship between the heights and the weights of 
151 infants. The mean weight was 21.8 pounds and the mean height was 
29.4 inches. Both distributions are first dichotomized at convenient points 
near their respective means. The height variable is dichotomized at the 
lower limit of the class interval which includes the mean, viz., 29.25 inches. 
The weight variable is dichotomized at 21.25 pounds. The results of cross- 
tabulating the data of these variables into a fourfold table are shown in 
Table 10:7. 


Table 10:7. The Dichotomization for Tetrachoric Correlation of the Height- 
Weight Measurements of 151 Infants 

(Original data in Table 9:1; Mweieht = 21.8 pounds, Mneight = 29.4 inches) 

Height 
(Variable 1) 

Below Above 

Average Average nw 


Above 

Average 

Weight 
(Variable 2) 

Below 

Average 


n// 70 81 N = 151 


a 

16 

b 

60 

c 

54 

d 

21 


The formula for the computation of rt is rather complex; however, the 
following formula, which yields a quadratic equation, usually provides a 
satisfactory estimate for normally distributed variables: 




TETRACHORIC CORRELATION 


277 


[10:9] 

he — ad Z\Z 2 Tetrachoric coefficient 

“ y y.yy 2 for linear correlation of 

dichotomized bi-vari- 


where 

a is the number of frequencies in a cell a (Quadrant II). 

b is the number of frequencies in cell 6 (Quadrant I). 

c is the number of frequencies in cell c (Quadrant III). 

d is the number of frequencies in cell d (Quadrant IV). 

yi is the ordinate value of the first variable at the point at which the dis¬ 
tribution was dichotomized (from Table 10:3 or Table I, Appendix B). 

y 2 is the ordinate value of the second variable at the point of dichoto- 
rnization. 

N is the total number of cases or instances cross-tabulated for correlation. 

2 i is the deviate distance, in terms of x/o-x, of the point of dichotomization 
from the mean of the first distribution (from Table 10:3, or Table I, 
Appendix B). 

Z 2 is the deviate distance of the point of dichotomization from the mean 
of the second distribution. 

For the height-weight data dichotomized and cross-tabulated in Table 10:7, 
the values for the solution of tetrachoric r are as follows; 

a = 16. 
b = 60. 
c = 54. 
d = 21. 

yi = .397 (from Tabic 10:3, whore pi = — .54 and a therefore equals .04). 

y 2 = .399 (from Table 10:3, where p 2 = tVt ®^d a therefore equals .00+). 

N = 151. 

z\ = .090 (from Table 10:3, where (a) = .036).* 

22 = .010 (from Table 10:3, where (a) = .003).* 

Substituting these values in Formula 10:9 gives the following: 

^ (60)(54) - (16 )(21) _ (.090)(. 010) , 

(.397)(.399)(15l)* 2 

2904 

= ^11 : 746803 - 

Expressed as a quachatic equation, this becomes: 

.00045/-,2 + r, - .804043 = 0 

* These values of z\ = .090 and 22 = .010 are interpolated from Table 10:3 for (a) values 
of .036 and .003 respectively. It will be observed that if either zi or 22 has a value of zero 
the second term of formula 10:9 will be zero, and consequently the solution for r, is 
readily obtained from the simple equation. In the above example the value of the r,* term 
is practically zero and its retention in the equation actually does not affect the value of r, 
to two decimal places. 



278 SPECIAL METHODS FOR THE UNEAR CORRELATION OF VARIABLES 


For this problem, the values of the coefficients are: 6 = 1.0; a = .00045; and 
c *a — .804043. This quadratic equation must be solved for r. The r term 
of the usual formulation of a quadratic, viz., 


is equal to: 


ar* 4- fer c = 0 


r 


—6 ±v^6* — 4ac 
2a 


Substituting the values for a, 6, and c, we find the r term is: 

-1.0 ± Vl.O - 4(.00045)(-.804043) 
2(.00045) 

One value for this equation is 

-1.0 ± 1.00072 4-.00072 

.0009 “ .0009 


The other value for this equation is 


-2.00072 

.0009 


This value for r< is absurd, since r cannot exceed ±1.00. 

Thus, the tetrachoric coefficient, r«, is found to be .80. This is somewhat 
higher than the value of .67 obtained by the method of product-moment 
correlation in Fig. 9:13. A higher value is to be expected with Formula 10:9 
which, as we pointed out, is a simplification of the complete equation for 
rt. The value of is even higher when Thurstone’s diagrams instead of 
Formula 10:9 are used for these data; this is shown in the next section. 


Estimating Tetrachoric Correlation with Thurstone’s Diagrams 

In order to use Thurstone’s Diagrams in determining a tetrachoric coeffi¬ 
cient, three proportions are needed: one for cell c of the 2 by 2 cross-tabula¬ 
tion, and one each for the total number of cases in the lower, or below-average 
part of each dichotomized variable. These are shown in bold-face type in 
Table 10:8 as (a), (b), and (c), where (a) is the proportion of cases for the 
first variable whose measures are below the mean; (b), the proportion of cases 
for the second variable whose measures are below the mean; and (c), the pro¬ 
portion of the total group whose cross-tabulated measures are in cell c, i.e., 
those below average on both variables. These symbols (a), (b), and (c) 
correspond to those used in Thurstone’s Diagrams. 

In computing the three proportions needed, sufficient accuracy is obtained 
by carrying the calculations to three decimal places and then rounding off to 
two places. In order to have an independent check on the accuracy of these 
computations, it is well to calculate the p values of all four cells and compare 
the sums of the proportions for the rows and columns with the marginal 
totals at the bottom and right of the fourfold table. This has been done in 



TETRACHORIC CORRELATION 


279 


Table 10:8, in which the frequency values in Table 10:7 have been converted 
into p values. The coefficient, is given as .74 in Thurstone’s Diagrams — 
a somewhat higher value than the r = .67 obtained by Pearson’s product- 
moment method. 


Table 10:8. The Cross-Tabulated Data of Table 10:7 Expressed as 
Proportions of N 

(For Tetrachoric Correlation by Thurstone’s Diagrams) 

Infants* Height 
(Variable 1) 


Below Above 

Average Average 


Above 

Average 

Infants' 

Weight 
(Variable 2) 

Below 

Average 


.46 .54 1.00 

(a) (a') (N = 151) 


a 

.10 

b 

.40 

c 

.36 

(c) 

d 

.14 

(cO 


rt = .91 (from Diagrams) 


The tetrachoric correlation method does not give a satisfactory estimate of 
linear correlation if the p value of either part of a dichotomized variable is 
less than .05. It is for this reason that Thurstone does not present diagrams 
for (a) values of less than .05. 

Thurstone’s Diagrams provide a separate page for different values of (a), 
beginning with p = .05 and ending with p = .50. If the value of (a) is greater 
than .50, then (a') and (c') are used in estimating rt (cf. Table 10:8). When¬ 
ever this change to (a') must be made, the sign for the value of indicated in 
the appropriate diagram must be changed; if it is negative, it is written as 
positive. This is the case because the diagrams are set up for the p values of 
Quadrant III (positive), whereas the p value for cell d is for Quadrant IV, 
which is negative. 


EXERCISES 

1. Describe three different problems for which Spearman’s rank-difference method 
would be appropriate. 

2. Under what circumstances is Spearman’s rank-difference method inadequate to 
measure the correlation between two variables? 






280 SPECIAL METHODS FOR THE LINEAR CORRELATION OF VARIABLES 


3. Compute a rankKlifference correlation for the bi-variate data correlated in Exer¬ 
cise 11, Chapter 9. Compare the value obtained for rko with that previously ob¬ 
tained for r. 

1. With the data in Table 5:14, dichotomize the average grade scores of the college 
freshmen at the mean of the distribution, and obtain a biserial correlation between 
their intelligence test scores and their dichotomized grade scores. 

5. Dichotomize the achievement ratings in Table 10:1 into two groups equal in size, 
and obtain the biserial correlation between the attitude test scores and the dichoto¬ 
mized achievement ratings. 

6. Use the following dis ribution of data to compute the point-biserial correlation 
between marital status and the gross amount of life insurance sold over a period 
of a year: 


Marital Status of Insurance Salesmen and Total Amount of Life Insurance 
Sold (in SlOOO’s) over a Period of One Year (M = Married; S = Single) 


Salesmen's 

Status 

Insurance 

Sold 

Salesmen's 

Status 

Insurance 

Sold 

Salesmen's 

Status 

Insurance 

Sold 

M 

■il 

S 

498 

M 

632 

S 


M 

671 

M 

599 

S 


M 

618 

M 

651 

S 

544 

M 

810 

M 

723 

M 

456 


530 

M 

410 

M 

331 


448 

S 

670 

M 

898 


314 

S 

617 

M 

676 

M 

592 

S 

769 

M 

641 

S 

803 

S 

862 

M 

520 

M 

533 

S 

550 

M 

648 

S 

526 

M 

375 

S 

408 

M 

624 

S 

426 

M 

960 

S 

435 

S 

547 

S 

273 

S 

493 

M 

746 

S 

208 

M 

901 


582 

S 

621 

S 

712 


804 

S 

549 

S 

520 


237 

M 

871 

M 

416 

M 

759 

M 

715 

M 

550 

M 

577 

M 

768 

S 

573 

S 

154 


7. Using the intelligence test scores in Table 5:14, dichotomize each distribution at 
its respective mean, and obtain the tetrachoric correlation coefficient between the 
scores of the college freshmen and of their best friends. Compare this result with 
that obtained in terms of r in Exercise 10, Chapter 9. 

8. What assumptions underlie the use of: 

a. serial r (biserial r, triscrial r, etc.) 

b. point-biserial r 

c. tetrachoric r 























PART TWO 

Sampling and Analytical Statistics 






CHAPTER 11 


Samples and Sampling Techniques 


A. INTRODUCTION 

We pointed out in Chapter 1 that in the development of statistical method 
it is useful to make a distinction between descriptive statistics, on the one 
hand, and sampling and analytical statistics, on the other. We indicated there 
that sampling and analytical statistics consist essentially in the study of 
statistical populations or universes in terms of the data of samples derived 
from them. In purely descriptive statistics, no distinctions are made between 
the sample and the universe—the part and the whole—the data being treated 
as if they constituted a whole. 

All statistics, however, are in a basic sense descriptive, regardless of whether 
methods for the data to be reduced are the data of samples or the complete 
data of a census. The fundamental methods of descriptive statistics that have 
been developed in the preceding chapters can be briefly outlined as follows: 

I. For the Data of Non-Variable Attributes 

1. The classification into categories of the data of non-variable attributes, 
yielding dichotomous or polytomous subdivisions. 

2. The enumeration of instances by category, yielding the statistical 
frequency. 

3. The development of proportions, percentages, and ratios for summary 
and comparative purposes. 

4. The cross-tabulation of categorical data of two or more attributes, to 
show the relationships between attributes. 

5. Tlie determination of the degree of relationship between the cross- 
tabulated data of two attributes by a technique of correlation. 

II. For the Data of Variable Attributes 

1. The organization into class intervals of the data of variates, yielding the 
frequency distribution. 

2. The summarization of variable data by the centile point method, 
yielding centiles, vigintiles, quintiles, deciles, terciles, quartiles, etc., and the 
derivation, from these values, of measures of dispersion and deviation. 

3. The summarization of variate data by the algebraic method of moments, 
yielding the arithmetic mean, the standard deviation, and the Coefficient of 

283 






284 


SAMPLES AND SAMPUNG TECHNIQUES 


Relative Variation; the derivation from these values of z score and Standard 
score measures, and a psychograph or profile chart, for comparative purposes. 

4. The cross-tabulation of the data of two variables, yielding the correla¬ 
tion chart or bi-variate distribution. 

5. The determination of the degree of relationship between the data of 
bi-variates by appropriate techniques of correlation. 

III. Appropriate Graphic Methods 

The treatment and summarization of both non-variable and variable data 
by graphic methods. 

In the remainder of this book we shall be concerned not only with the 
application of these methods of descriptive statistics to the data of samples, 
but also with the subject matter fundamental to sampling and analytical 
statistics: 

I. The Techniques of Sampling 

r In using statistical methods with sample data, we must know the require¬ 
ments in sampli ng in order to apply methods of statistical analys is correctly 

L and to dr aw sound inference s from the results. This is the subjecFmatter of 
the present chapter. 

^I. Probability Theory and Statistical Inference 

In studying statistical universes or populations by means of analyzing 
sample data, wc utilize the implications of the mathematical theory of 
probability. We need to know not only what these fundamental implica¬ 
tions are, but how to apply them to particular problems that arise in 

( psychology and related fields. We shall see that sound statistical inference 
is a form of probable inference and is developed statistically in terms of 
what is called a Test of Significance. These problems of sampling and ana¬ 
lytical statistics will be considered in Chapters 12-15. 

vJlI. Correlational Analysis 

We have already seen in Part I that the method of correlation is funda¬ 
mental in statistics for the study of relations between phenomena. Correlation 
is essential to the discovery of law and order in the social sciences as well 
as in many aspects of the biological and physical sciences. In fact, Karl 
Pearson * took the extreme view that nature itself is essentially statistical in 
character: that law and order are formulations of what is empirically observed 
as characteristic of the average. Correlations between attributes or qualities 
are taken as the basis for law and order in nature. Irrespective of Pearson’s 
point of view, the analysis of correlative relations has come to form the most 

* Karl Pearson, Grammar of Science, Adams and Charles Black, London, rev. ed., 1900. 



INTRODUCTION 


285 


exploratory statistical technique of the biological and social sciences. Hence 
some of the basic aspects of correlational analysis will be considered in 
Chapters 16-18. 

Census vs. Sample 

Several distinctions in the terminology of descriptive and sampling statistics 
were suggested in Chapter 1. We need first a set of terms that will always 
clearly differentiate the part from the whole. In other words, we must be 
able unambiguously to distinguish a sample, or part, from the whole with 
which it is identified. The term sample is used to designate the part. The 
whole is called the statistical population, or the collective or statistical universe. 

Whereas the observations or measurements of a sample yield sample data, 
the observations or measurements of an entire population yield census data. 
The analytical methods used in sampling statistics are developed for treating 
samples in the study of populations. Not only are the methods of descriptive^ 
statistics often sufficient for the summarization and presentation of census 
data, but they are indispensable for reducing the data of samples. 

Were it possible and feasible to do so, a census rather than a sample ofi 
population would usually be obtained for study. The only errors in a censusL 
would be errors of observation and measurement, for a census would include all' 
instances or members of the group to be studied. The statistical analysis of the 
results would be comparatively simple, provided the measurements were 
made under scientific conditions of control and standardization, for under 
such conditions the errors in the measurements would be distributed accord¬ 
ing to known laws of probability and hence could be taken into account in 
interpreting the results. 

The normal, bell-shaped distribution is also the normal probability curve, \ 
and is often called the normal curve of error. It shows the way in which purely L 
chance errors, or their effects, are usually distributed over an extended series 
of observations and measurements. The controls of scientific method, long 
employed in the laboratories of the physicist and chemist, are techniques 
that have been devised to eliminate bias in a series of results. In fact, scientific 
method per se can well be described in terms of the operations for observation 
and measurement that yield results whose values are just as likely to be 
affected positively as negatively by the errors that necessarily appear in any 
process of measurement. 

However, for reasons indicated in the next section, it is usually either 
impossible or not feasible to obtain a census. A census is not necessary for the 
scientific study of a population; if it were, there would be very little in the 
way of scientific knowledge. Under well-designed and controlled conditions, a 
sample of observations or measurements can be obtained such that the nature 
or character of a statistical population can be logically and soundly inferred 
from the study and analysis of this sampled fraction of the whole. True, 
errors of sampling will enter, but again, if the method of sampling is properly 



286 


SAMPLES AND SAMPUNG TECHNIQUES 


controlled, we can be confident that they will operate as c hanc e erro rs; that 
is, they will be just as likely to affect the results positjvely^iM jni^atively. 
Knowing how such errors behave, we can allow for them in interpreting our 
results. 


Sampling Is a Research Technique 

Scientific method has until recently been portrayed in psychology and the 
social sciences essentially as a controlled and standardized procedure for 
making a series of observations or measurements under unbiased conditions 
such that the procedure could be repeated by two or more researchers working 
independently of each other. But scientific method in the biological and social 
sciences involves more than laboratory or field control over observation or 
measurement. It is a technique devised to obtain the correct sample for study 
in lieu of the data of a population or universe. It is just as important, for the 
sound interpretation of the data of a sample, to know the conditions under 
which it was obtained as to know the processes used for the observations or 
j measurements. Scientific sampling techniques are as integral to the design of 
1 an experiment as are the processes used in observation and measurement.* 

This chapter, then, emphasizes an aspect of scientific method—the tech¬ 
nique of sampling —that is fundamental to research problems, whether they 
be the health value of a certain vitamin, an evaluation of different learning 
methods, or consumer opinion about a new kind of vacuum cleaner. Sampling 
is integral to scientific method whenever a census is impossible or not feasible. 

A Gallup Poll 

Some of the preceding points can be illustrated by an example from social 
psychology, viz., the increasingly important field of public opinion research. 
Let us say we wish to know the attitude of the voters in the United States 
toward Great Britain’s request for a post-war loan of several billion dollars. 
A census may appear theoretically possible, but it is obviously not feasible 
to interview each franchised citizen in order to obtain his answer to this 
question—^the United States Census of 1940 cost about 50 million dollars. 
So we shall take a sample from the people whose opinions are to be obtained. 
First, we need precisely to define the population or universe to be studied. 
Second, we need to obtain a sample of opinions that will be representative of 
the opinions of the whole group. This representativeness cannot be perfect; 
but if the sample is adequate in size and properly drawn, we can allow for the 
chance errors that will necessarily be present. Such errors include sampling 
errors, those inherent in the operation of sampling, and errors of observation, 
those inherent in the operation of getting each respondent’s opinion in an 
interview. Although we are here concerned with reducing sampling errors to 


* Gf. in this regard, R. A. Fisher, The Design of Experiments, Oliver & Boyd, London, 
2nd ed., 1937. 




INTRODUCTION 


287 


a satisfactory minimum, errors of observation also affect the result and hence 
must likewise be kept to a minimum by scientific controls in the technique 
of interviewing, the formulation of the question, the recording of each answer, 
and the processing of the results.* 

On October 1,1945, Dr. George Gallup, Director of the American Institute 
of Public Opinion, reported the public’s opinion on the above question worded 
as follows: 

“ENGLAND PLANS TO ASK THIS COUNTRY FOR A LOAN OF THREE TO 
FIVE BILLION DOLLARS TO HELP ENGLAND GET BACK ON ITS FEET. 
WOULD YOU APPROVE OR DISAPPROVE OF THE UNITED STATES 
MAKING SUCH A LOAN?” f 

From a carefully constructed national sample of the voters in the United 
States, each member of which was personally interviewed about his opinion 
on this question, Gallup reported the following results: 


Approve 

27% 

Disapprove 

60% 

No Opinion 

13% 


Three-fifths of the sample were opposed to such a loan, the ratio for those 
holding an opinion being more than 2 to 1 against the proposition. This is 
the sample result. Can we have confidence in it to the extent of concluding 
that all the voters in the United States had a similar division of opinion on 
the question? It is not feasible to check such sample results against a census 
of the opinions of all the voters; hence any confidence in the result must be 
based on our knowledge of the procedures employed and the reputation of the 
American Institute of Public Opinion, which has been making such surveys 
over a period of years. The sampling methods used by the Institute in the 
above survey were the same as those it used to forecast, with an error no 
greater than 2 percentage points, the outcome of five fairly recent national 
elections in five countries—^Australia, Canada, Great Britain, Sweden, and 
the United States. In the 1945 election in Great Britain, the British Institute 
of Public Opinion forecast the gains of the Labor party with an average error 
of only 1%. Thus, although we have no independent criterion against which 
to check the publiq opinion poll on the loan to Britain question, we do have 
a basis for confidence in the result because of the Institute’s previous out¬ 
standing successes in predicting election results from samples taken by means 


* The techniques of public opinion research have been recently summarized by Hadley 
Cantril and his research associates in the Office of Public Opinion Research at Princeton 
University in Gauging Public Opinion, Princeton University Press, 1944. 

Particularly detailed attention to questionnaire methods of public opinion cmd market 
research is given by A. B. Blankenship in Consumer and Opinion Research: The Question^- 
rvaire Technique, Harper, New York, 1943. 

t George Gallup, “Loan to Britain Opposed,** New York World-Telegram, October 1, 
1945. 



288 SAMPLES AND SAMPLING TECHNIQUES 

of the same technique. This technique is known as stratified sampling by the 
quota system, and will be described in a later section. 

Errors of Sampling 

How about the errors that necessarily enter into Gallup’s sample results? 
Neither Gallup, Elmo Roper, nor the others doing research on public opinion 
ever claim that such a result is entirely free of sampling and measuring errors. 
Consequently, the problem is to determine the probable effect of these errors 
on the percentage results reported by Gallup. Is the effect of these errors 
likely to have been so great that we cannot be confident that at least a 
majority (50%+) of all the voters would have opposed such a loan? Or is it 
likely to have been so small that we can be confident that at least 55% of 
all the voters would have opposed such a loan? Is it likely that as many as 
two-thirds (66%%) of all the voters would have opposed it? 

It would be premature to describe the methods used in answering these 
questions (they are discussed in the following chapters), but it is in order to 
emphasize the following points: 

<1. When sample results are based upon a scientifically controlled sampling 
technique, both sampling errors and errors of measurement should operate 
as chance errors. 

' 2. The effect of chance errors on a result can be dealt with satisfactorily by 
methods of analytical statistics that have been developed specifically for 
this purpose. 

'3. These statistical methods are based on the implications of the mathe¬ 
matical theory of probability. 

"'4. Their application consists essentially in setting up appropriate statistical 
Tests of Significance and, in the light of such tests, formulating conclusions 
in which confidence is warranted. 

A Test of Significance, like those developed in Chapter 13, Section D, 
would indicate that we can be confident that at least a clear majority of the 
voters* opinions, as of the time the poll was made, was opposed to such a 
loan to England. 

B. STATISTICAL POPULATIONS OR UNIVERSES 
The Statistical Universe 

A statistical population is not to be confused with a population of people. 
In the preceding sample, the statistical population under consideration was 
not a group of people; rather, it consisted of the opinions on the particular 
question held by all the voters in the United States. Opinions are obtained 
from people, just as are measures of aptitude, personality ratings, etc. It 
I is the attributes, behavior, or traits of people, therefore, that constitute 
statistical populations or universes. Populations may also consist of the 



STATISTICAL POPULATIONS OR UNIVERSES 289 

attributes or qualities of other things, as for instance, the behavior of dice or 
coins, or the behavior of the molecules in a given volume of gas. 

The concept population or universe in statistics thus denotes the whole 
which includes all the observations or measurements of a"non-variable or 
v ariable characteristic . If the characteristic is non-vafiable and dichotomous, 
such as radios in U. S. homes, the statistical universe will be the number of 
all homes equipped with radios and the number of all homes not equipped 
with radios. If the characteristic is variable, such as the radio listening be¬ 
havior of people, the statistical universe for a given period of time will be the 
different amounts of time all the people spend listening to their radios. The 
statistical universe could be the dichotomized behavior of a coin tossed an 
infinite number of times, and consisting of the number of times the coin 
landed heads up and the number of times it landed tails up. Or the statistical 
universe could be the variable behavior of a collection of coins tossed an 
infinite number of times, and consisting of a distribution of the frequencies 
with which the different possible combinations of heads and tails (including 
all heads and all tails) occurred. 

The concept population or universe in statistics thus defines all the measure¬ 
ments or observations of the attribute or behavior of the phenomenon being 
studied. The actual behavior of dice as observed would constitute a sample 
from a universe whose instances theoretically comprise a population infinite 
in size. In microphysics, the behavior of a molecule in a given volume of gas 
could constitute a universe from which a very small sample of such behavior 
might be measured. 


Finite and Infinite Populations 

From the foregoing examples of statistical populations, or universes, it 
should be evident that they may be finite or hifmite in size. The universes of 
public opinion and market research investigations are usually t aken as finite , 
even though they may not be readily susceptible to an exact count. The 
fundamental calculus of probabilities has been developed for universes con¬ 
sidered infinite in size, as for example, the behavior of dice or coins tossed an 
infinite number of times. Fortunately, from the point of view of applying the 
calculus of probabilities to finite universes, the latter can often be treated as 
if they were infinite in size, provided that the universe is large in relation to 
the size of the sample. Appropriate samples drawn from large finite populations 
will have statistical implications similar to those of similar samples drawn 
from infinite populations. 

Actual vs. Hypothetical Universes 

All infinite universes are necessarily hypothetical. An actual universe, on 
the other hand, is one such that the behavior of all its members or instances 
is susceptible to observation or measurement. Such a universe obviously must 



290 


SAMPLES AND SAMPUNG TECHNIQUES 


be finite in size. A behavior or quality of the adults of a nation constitutes 
an actual universe provided we are referring to the behavior of thosef adults 
at a given time. But if we are speaking of a trait or characteristic of a racial 
group, such as stature or skin pigmentation, then presumably we are referring 
to a hypothetical universe that includes people still to be bom, as well as 
those living today. Such a universe is part actual and part hypothetical. 
Obviously only its existing part can be sampled at any given time. 

From the point of view of the statistical analysis of sample results, it makes 
little difference whether the universe is actual or hypothetical, provided each 
can be appropriately sampled. In one sense, this distinction between an 
actual and a hypothetical universe is exemplified by any situation in which 
an attempt is made to predict the future behavior of the phenomenon studied. 
The basis for present observations is a sample from a necessarily existing 
universe, where the predictions concern a universe not yet come into being. 
This contrast, interestingly enough, is characteristic of actuarial analysis, 
in which, for example, mortality tables are established and the attempt is 
made to predict the length of life of living people. Similarly, in polls of voters’ 
preferences for candidates prior to elections, the attempt is made not only to 
ascertain their preferences at the time of the poll, but also to predict their 
preferences when the voters actually go to the polls. In other words, prior to 
the election the behavior of actual voting constitutes a hypothetical universe. 
It cannot be sampled directly, but people’s attitudes and opinions about the 
way they expect to vote can be sampled and studied. 

We cannot be certain that sample results in September or October will 
necessarily forecast the outcome of a November election, but we can obtain 
preliminary “straws-in-the-wind.” Like the actuary, the researcher in public 
opinion can make estimates or predictions with considerable confidence, 
because of the ever-increasing, successful experience in this field. Unusual 
events that might radically alter the outcome of an election, such as the 
death of one of the candidates just prior to election day, could of course 
vitiate the prediction based on a poll of voters’ preferences. There are no 
certainties in the statistics of probability, but there are reasonable expectancies. 

C SAMPLES AND THE TECHNIQUES OF SAMPLING 

A sample is an actual collection of observations or measurements of an 
/ attribute, a behavior, etc. A sample is therefore necessarily finite. There are, 
^ howeve^ different kinds of samples. They can be differentiated in terms of 
either^) the way in which they are related to the universes of which they 
. are a part, or ^2) the technique or method used in obtaining them. Either 
or both of these criteria, considered together, are relevant to the way in 

( which a particular sample is to be characterized. Samples may be representa¬ 
tive or buised; random or stratified-and-random^ adequate or inadequate^ con¬ 
trolled or uncontrolled; restricted or unrestricted, etc. 



REPRESENTATIVE SAMPLES 


291 


Representative Samples 

A sample of observations or measurements is representative of the universe 
of which it is a part if it is a replica of that universe. Thus, if a sample of 
information about the nativity of 100 people living in a given area shows that 
75 are native-born and 25 are foreign-born, and a census of the nativity of 
all the people in that area reveals that 75% are native-born and 25% are 
foreign-bom, then the sample of 100 observations is a replica of the statistical 
universe—the sample is truly representative. A sample of the intelligence test 
scores of 1000 salesmen would be representative of the intelligence test scores 
of all salesmen if the distributions for both sample and universe were similarj 
in form and if their means and standard deviations were the same. 

A representative sample, in other words, is one that yields a division or 
distribution of the attribute or behavior being studied that is the same asi 
its division or distribution in the statistical population. The question of the] 
representativeness of a sample is directly concerned with the behavior or 
trait under study, and only indirectly with some other behavior or traits. 
Does a sample of measurements of variable x yield values that are the same as 
the measurements of variable x derived from the entire statistical universe? 
Kx. = Xu, t he representativen ess i s perfect. In a market research investiga¬ 
tion of consumer-use of a product, a sample is not necessarily representative 
of consumer-use in the universe, even though the sample is composed of the 
proportions of the sexes, of the different age groups, of the different income 
or economic groups, etc., that are truly characteristic of the proportions of 
these attributes in the total group of people in that particular market area. 
• The sample is truly representative only if the proportions of those people 
using the given product and those not using it are identical with the corre¬ 
sponding proportions in the population. A sample is a replica of a universe, 
and therefore representative, provided that the distr^ution of t^ observations 
or measurements under investigation is identical in both sample and universe. 

In order to determine whether a sample result is truly representative of a 
universe, we need to know certain facts about that universe. Ordinarily, 
however, we do not have the information about the universe necessary for 
an absolute check on the representativeness of the sample. If we did, there 
obviously would be no need to work with a sample; we could proceed with a 
summary and analysis of the data for the universe itself. In practice, there¬ 
fore, it is rarely possible to describe a sample as truly representative of a 
population. Instead, t^ character of a sample is usually described in terms 
of the methods used in obtaimhg it rather than in terms of its representative- 
ness o f the sta^tical population. By this criterion of method, samples may 
be classified as follows: 

^ 1. Random samples. 

%/2. Stratified-random samples. 

n/ 3. Accidental or uncontrolled samples. 



292 


SAMPLES AND SAMPLING TECHNIQUES 


Either of the first two techniques, when used with samples of adequate 
size, should yield a result that is sufficiently representative of the statistical 
population to warrant confidence in the conclusion drawn from it. Repi^ 
Rtmiojim n^s is a question of degree, not of mutually exclus iv e altern ates . 

^ Before considering the methods and techniques of sampling, we shall con¬ 
sider the obverse of a representative sample, viz., a biased sample. 

Biased Samples 

A biased sample is definitely non-representative of the statistical imiverse 
of which it is a part. The scientist obviously uses every possible means to 
avoid biased samples. Consequently, when they occur, it is because of care¬ 
lessness or ignorance of proper sampling methods, rather than any intent 
on the part of the investigator to obtain unsatisfactory results. 

One of the most famous examples of a biased sample in public opinion 
research occurred during the presidential campaign of 1936. The Literary 
Digest had conducted mail-ballot polls of voters’ preferences during several 
other presidential campaigns, and the predictions had been fairly successful. 
In 1936, however, its poll failed completely when it predicted the overwhelm¬ 
ing defeat of Roosevelt and the election of Landon. It was found that the 
ballots mailed by the magazine in this poll were addressed only to samples 
of l isted teleph o ne subscribers and automobile owners tliroughout the United 
States. Unfortunately for the success of this poll, the voting preferences of 
people whose homes had telephones and who owned automobiles were not 
representative of the people generally. Many more people who had no tele¬ 
phones or automobiles voted for Roosevelt than for Landon in 1936. Inter¬ 
estingly enough, during the same campaign Dr. Gallup conducted a national 
poll by both mail-ballot and personal interview methods. Giving the factors 
of listed telephone subscribers and registered automobile owners, as well as 
other relevant factors, their proper weight in the total result, he predicted 
rather closely not only the outcome of the election but also the error in the 
Literary Digest poll. 

Not all kinds of universes are equally difficult to sample without bias. An 
analysis of the red and white cells in a drop of blood is sufficient to give a 
satisfactorily representative sample of the distribution of all the red and 
white cells in a person’s total blood supply. Although the population of cells 
may be somewhat variable in different parts of the circulatory system, the 
blood cells are sufficiently homogeneous in their distribution so that a drop 
taken from the finger tip will give a satisfactory unit for sample analysis. 
Here the sample is a cell count based on a small unit of the whole. But a 
single geographical unit of people in the United States, such as any city or 
any state, would in no way be satisfactory as a base for studying the anthro¬ 
pometric characteristics of all the people in this country, let alone their 
psychological traits and social attitudes. The homogeneity characteristic of 



BIASED SAMPLES 


293 


a drop of blood does not characterize the distribution of traits or behavior 
among people. That is, a group of people drawn from a small geographical 
unit of the whole will not be likely to provide a satisfactory base for a sample 
of such variable qualities as stature, aptitude, and opinion. In fact, as we 
pass from physical anthropology to psychological functions and social atti¬ 
tudes, the variability or heterogeneity of statistical universes increases. And, 
pari pa^su, they become more difficult to sample without bias. 

Constant vs. Chance Errors 

In contrast to chance errors, of either sampling or measurement, the errors 
that result from biasing factors affect the results of an experiment or investi¬ 
gation in a constant way. Hence, biasing errors are usually called constant 
errors of sampling or of measurement. How can these constant errors be 
avoided in sampling? 

As the experience of research workers in a given field accumulates, it is 
increasingly possible to obtain samples that avoid bias to any disturbing or 
distorting extent. The accumulation of sampling experience, checked against 
census results or against the predicated implications of sample results, reveals 
various factors that make for bias and makes it possible to avoid them sedu¬ 
lously. There would obviously be bias in any attempt to sample voters’ pref¬ 
erences with a group of registered Democrats as the only base. 

Fortunately two sampling techniques are available that permit most, if 
not all, of the pitfalls of biased samples to be avoided. In fact, only with 
one or the other of them, or as close an approximation as is possible, can we 
hope to obtain sa tisfactory sample s for the study of a universe. These two 
methods are random sampling and s tratified -ra ndom sampling . Random 
sampling techniques yield a result for the universe being studied that becomes 
increasingly satisfactory as the size of the sample is increased. Stratified- 
random sampling also yields a result that becomes increasingly satisfactory 
not only as the size of the sample is increased, but also as certain additional 
control factors are introduced. 

Bias from Inadequate Methods of Observation and Measurement 

Bias in surveys of the character, behavior, or opinions of people is often 
greater because of con stant errors of observation and me asurement than 
because of constant errors of sampling. The science of sampling has been 
sufficiently well developed to enable the researcher (1) to reduce chance 
sampling errors to a satisfactory minimum and (2) to avoid serious biases in 
sampling. 

Biasing errors of observation and measurement, however, are all too likely 
to plague any investigation in which the information sought is about or 
derived from people. W. E. Deming, of the United States Bureaus of the 
Census and the Budget, has listed thirteen factors that make for errors in 



294 


SAMPLES AND SAMPUNG TECHNIQUES 


surveys.* Only a few of them lead to errors that are due to inadequate sam¬ 
pling, as for example, bias resulting from n on-respon se or f rom late respon ses, 
from unrepresentative selections of respondents, or an unrepresentative date 
for a survey. The remaining sources of error are largely due to faulty pro¬ 
cedures of observation and measurement, such as (1) variability in the 
answers or data furnished by respondents; (2) errors peculiar to the kind of 
method employed in canvassing respondents (mail vs. telephone vs. telegraph 
vs. direct interviews; intensive vs. extensive interviews, etc.); (3) bias and 
variations caused by faulty interviewing; (4) bias due to respondents’ reaction 
to knowledge about the sponsor of the investigation; (5) bias from imperfectly 
designed questionnaires; (6) processing errors in the coding, tabulation, and 
statistical summarization of the data; and (7) errors due to misinterpreting 
the questionnaire data or the statistical results, or to personal bias in inter¬ 
pretation. 

Biasing errors of observation and measurement resulting from one or more 
of the above causes are as common in a census of a population as in a sample. 
Nor does the taking of a census in itself eliminate biases of this kind. If a 
census is taken, but the methods of enumeration or measurement are faulty, 
only the sampling error will be minimized or eliminated. It is because of errors 
of observation and measurement that no categorical answer can be given to 
the question of the size of sample necessary to guarantee an adequate result or 
a result that will be accurate within a given margin of error. Adequacy de¬ 
pends not only on the character and size of the sample, but also on the 
techniques and procedures used to obtain the desired information from the 
respondents and to summarize it. The pre-testing of techniques and pre¬ 
liminary test-tube surveys are essential if biases in observation and measure¬ 
ment are to be avoided. 




RANDOM SAMPLES—THE PRINCIPLE OF RANDOMIZATION 

Definition 


A random sample is one such that e ach instance or member of the universe 
being sampled has an equal chance of appearing in the sample. In other words, 
each observation "of Ineasureihelnt^f a randoni kample HeS the same oppor¬ 
tunity, no more and no less, as all the other instances of the universe, of 
appearing in the sample. Sometimes random sampling is called simple 
sampling.” However, there is nothing simple about applying the method, for 
it involves the basic technique of control characteristic of all sound sampling 
procedures. 

The theory and development of sampling statistics are based upon the 
assumption that a sample of a universe is a random sample. To the extent, 


* W. E. Deming, “On Errors in Surveys,” American Sociological Review^ 9:359-369,1944. 



RANDOM SAMPLES—THE PRINCIPLE OF RANDOMIZATION 295 

that samples are not random, we cannot have confidence in generalizations 
about universes, made in the light of an analysis of sample results.* 


The Technique of Random Sampling 

How can a sample be established so that each instance or member of the 
universe to be studied will have an equal opportunity of appearing in the 
sample? In order to guarantee that a given sample will be a random sample, 
a careful control technique is necessary. It consists essentially in numbering 
each member or instance of the universe, and then drawing the desired size 
of sample by means of a lottery technique which in itself has been tested for 
randomness. 


Random Numbers 

Since mechanical lottery procedures are often inconvenient to use, tables 
of numbers that have been tested for randomness are available, f A table of 
truly random numbers is one such that any digit from zero to 9 has an equal 
chance of appearing in any position in the table (see Table II, Appendix C). 
Digits are combined by rows or columns to give two-place, three-place, or 
n-place numbers, according to the needs of the particular study. Hence, if 
the members or instances of a universe are consecutively numbered from 1 to 
N (the total size of the universe), a random sample of 100 or 1000, or what¬ 
ever size is desired, can be obtained from the universe by means of a table 
of random numbers. The instances in the universe are chasen for the sample! 
when, their numbers come up in the table of random numbers. ' 

It is often impossible to use a lottery method that will guarantee a random 
sample of a statistical universe. When a universe is taken as infinite in size, 
it is obviously impossible to number all its members and draw from it a ran¬ 
dom sample by the procedure just described. Also, it is usually not feasible or 
possible to assign numbers to all the members of a large finite population, 
such as all the voters in the United States. Ordinarily it is impractical to 
assign a number even to each voter in one state or one city in order to draw 
a random sample with the aid of a table of random numbers. Only in a great 
emergency, as in World War II and the Selective Service System, is it feasible 
to number large finite populations and draw samples by a truly random 
method, whether by a mechanical lottery device or by a table of random 
numbers. 

In practice, research workers have therefore had to develop other tech¬ 
niques for the randomization of a sample. These can yield satisfactory results, 

* A stratified-random sample is a special case of random sampling and hence does not 
constitute an exception to this basic assumption. 

t R. A. Fisher and F. Yates, Statistical Tables for Biological, Agricultural and Medical 
Research, pp. 82-87, Oliver and Boyd, London, 1938, Table 33; M. G. Kendall and B. B. 
Smith, Randomness and Random Sampling Numbers,” Journal of the Royal Sialislical 
Society, 101:147-166, 1938; L. H. C. Tippett, Random Sampling Numbers, Cambridge 
Univ, Press, Cambridge, 1927. 



296 


SAMPLES AND SAMPLING TECHNIQUES 


even though the guarantee of randomness characteristic of a table of random 
numbers or a lottery device may be lacking. 

Alphabetization cls a Basis for Sampling 

If there is available an alphabetical listing of the names of persons whose 
attributes or behavior is to constitute a universe to be studied by sampling 
methods, then e ach ith case (5th, 10th, 20th, etc., depending on the size of 
sample desired in relation to the size of the finite population) can be drawn, 
and the sample will yield a result fairly comparable to that obtained with a 
table of random numbers. Certain precautions in using such lists are neces¬ 
sary to avoid distortion of the samples. As Stephan * has pointed out, some 
of the names of the list or some records in a file may be missing or have been 
temporarily removed. If records are missing because they are being used, 
their absence may distort the sample because there may be some correlation 
between the active use of these records and the trait or behavior being studied. 

One of the best examples of the randomization of samples in market re¬ 
search is provided by C. E. Hooper’s radio research organization which con¬ 
ducts surveys of home listening to radio programs; the surveys are published at 
regular intervals and are widely used by the radio and advertising induslries.t 
Hooper’s organization draws random samples from the lists of telephone 
subscribers in the major population areas in the United States. The names 
for each sample are located on these lists by means of a mechanical lottery 
device. Hooper’s statistical universe is strictly defined in terms of listed 
telephone subscribers and is sufficiently large that no subscriber will be 
included in a random sample more often than once a year. Over the calendar 
year, each listed subscriber has an equal opportunity with all other listed 
subscribers of being included in the samples chosen. The names drawn for 
a given sample are used only once during the year in order to avoid errors 
of mecLSurement that might result from too frequent telephone calls to the 
same homes, and the consequent possible annoyance of the respondents. 

In the United States Census of 1940, some of the information sought was 
obtained by sampling methods, for the first time in the history of the census.^ 
A sampling technique somewhat analogous to taking every £th case from a 
file was employed by the interviewers. Each census taker secured information 
on certain questions (nativity, usual occupation, social security status, marital 
status, etc.) from each 20th person. The sample obtained was therefore based 
on 5% of all the people in the United States. 


* F. F. Stephan, “ Practical Problems of Sampling Procedures,” American Sociological 
Review, 1:569-580, 1936. 

t M. N. Chappell and G. E. Hooper, Radio Audience Measurement, Stephen Daye, New 
York, 1944. 

t Cf. F. F. Stephan, W. E. Deming, and M. H. Hansen, ‘‘The Sampling Procedure of 
the 1940 Population Census,” Journal of the American Statistical Association, 35:615-630, 
1940. 



RANDOM SAMPLES—THE PRINCIPLE OF RANDOMIZATION 


297 


Interviewing each ith case of a universe can yield a satisfactory sample 
provided the research worker uses a method that will avoid basing the selec¬ 
tion of each ith case on any factors that could conceivably bias the results 
sought. If the interviewers adhere strictly to asking, say, each 20th case the 
sample questions and go from family unit to family unit in an order pre¬ 
determined from their location on a map (the order must not be influenced 
by the appearance of the neighborhood, the loquacity of a respondent, the 
absence of a respondent, etc.), bias can usually be avoided. Generally, the 
randomization of people or families in house-to-house interviews in a given 
area must be accomplished by a systematic design of alternation or selection, 
laid out on a map or schedule in advance of the actual field work. When homes 
or families can be numbered, a table of random numbers or a lottery device 
can be used in selecting those to be included in the sample. 

Principle of Inertia of Large Numbers 

Some of the interviewers for the 1940 Census were evidently not too well 
trained for their jobs, and consequently certain biases may have entered 
some of the interviews. If this was true of only a small proportion of the 
interviews, the practical effect of such errors of measurement, or of sampling, 
would most likely be negligible l)ecause of the thousands of interviewers at 
work, and because of the operation of the principle of the inertia of large 
numbers. Thus, the 5% sample obtained in the 1940 Census amounted to more 
than six million cases. An extra million or two would not be likely to make 
any practical difl'erence in the over-all results of the national sample, even 
though some interviewers may not have adhered too strictly to the rules for 
randomizing the sample on each 20th case, in order of interviews. 

It should be emphasized in this connection that merely increasing the size 
of the sample will not eliminate any biases inherent in the general sampling 
technique used. Nor will the replication (or repetition) of a sample increase 
the soundness of the result if the same defective sampling method is em¬ 
ployed for both samples. In other words, the principle of the inertia of large 
numbers does not mean that the addition of thousands or millions of cases 
to an already sizable sample will eliminate the effect of bias inherent in the 
sampling technique, but rather, that the effects of bias or constant errors 
may become negligible if such errors occur in only a small proportion of the 
observations or measurements. 

The Sampling Unit 

The sampling _uni t of an investigation is the basic identity whose charac¬ 
teristics bFfiehavior is to be studied. The basic sampling unit in biology and 
the social sciences is usually the individual organism or person. However, 
the sampling unit is often the family in consumer research studies. It may be 
the household, a farm, a business organization, a school, a type of crop, even 



298 


SAMPLES AND SAMPUNG TECHNIQUES 


a molecule. The sampling unit for a given investigation naturally depends 
on the nature and purpose of the study . 

Initial or primary sampling units are sometimes used and are to be dis¬ 
tinguished from the sampling unit per se. That is, the sampling imits may 
lie within a large geographical area, and the initial sampling made with 
respect to geographical subdivisions prior to the sampling of the individual 
units that are to be studied. 

In market and public opinion research, an i nitial random s ample of geo ¬ 
grap hical OTjireal units is sometimes obtained, iuchlisTov^hips or sections 
in sparsely populated areas, and blocks in cities. These units are numbered, 
and those to be used are then selected by means of a table of random numbers. 
The sample of people whose behavior or opinions are to be studied are those 
residing in each such geographical unit. Or ra ndom sub-samples within each 
geographicaLui^ may be drawn. But the b asic samplj ng^^it is still a pere on. 
and jot the geographica l^eas. The latter are units of^e distribution of 
[people and are used as the initial or primary sampling units when a universe is 
'too large for feasibl(3 randomization of people from it as a whole. 

« The size and number of geograpliical units to be employed depend upon 
I the purpose of the investigation, the heterogeneit y of the people in the uni- 
^verse to be sampled, and their density of concentration per areal unit. Gen¬ 
erally, smaller geographical units and a greater number of these units for 
sampling give a more dependable result than larger areal units and a corre¬ 
spondingly smaller number of them. This is the case because the homes of 
people in a city, state, or nation are not randomly distributed. That is, people 
of the same color group, or of similar educational or economic classes, tend 
to be located in one section—“the West Side” vs. “the other side of the 
tracks,” etc. And for many types of market and public opinion investigations, 
there is a correlation between such factors as people’s skin color, economic 
status, and education, and their buying habits, social attitudes, opinions on 
public questions, etc. 

The difficulties encountered in obtaining random sampling units from 

t real or other initial sampling units are so great that the technique of s tratified - 
andom ^ampling is now usually employed. With this method, as we shall see 
in the ToIIo^ig section, the results for any areal unit of the total sample 
can be prevented from overweighting the result. Thus, the difference in the 
concentration of family units in poor and in wealthy neighborhoods can be 
adjusted for and given the proper weight in the total sample. 

Public school systems or schools, rather than geographical areas, are taken 
as the units for sampling in some types of investigations, as in educational 
or public finance research. All the public schools in a city, county, state, or 
nation can be numbered and a random sample drawn. The relevant aspects 
of each school (pupils, teachers, textbooks, financial records, etc., depending 
on the purpose of the investigation) will comprise the basic sampling units. 
If it is the average size of classes, and their variation, that is to be determined 



STRATIFIED-RANDOM SAMPLING 


299 


from a sample of schools, the basic sampling units will be classes instead of 
pupils. If it is the ratio of average daily attendance to enrollment that is 
being investigated, the initial units, schools, will also be the basic sampling 
units. 

When the initial sampling unit differs from the basic sampling unit, 
stratified-random sampling is usually better than to employ the initial units 
and depend upon relevant factors to randomize themselves among these 
units. Possible biasing factors can be controlled better by the stratification 
technique. 


E STRATIFIED-RANDOM SAMPLING 
Definition 

A stratified-random sample consists of two or more random samples drawn 
from two or more subdivisions (or strata) of the universe, each stratum 
having been established with respect to one or more secondary control 
factors. The size or weight of the sa mple from each stratum corresponds to 
the prop ortionate size or weight of t he con trol factors in the universe being 
studied. The total sample result is therefore derived from a series oTrandbnT 
samples, each of which is properly weighted for the proportionate incidence 
of the control factors in the univcrse.s The secondary co ntrol factors in strati¬ 
fied sa mpling are tra its .or behavior that cor relate to some d^eejwfth _the 
att ribute to be ii west igated. They are the criteria on the basis of which a 
universe is subdivided into two or more strata, or combination of strata, each 
of which is then sampled^^ 

In the literature, stratified-random samples are often referred to simply 
as stratified samples. However, when randomization is employed as the 
primary control factor, as should be the case when the total result is to be 
treated as if it were a random sample of the universe studied, then it is im¬ 
portant to emphasize the fact that randomization, the basic factor of control 
in all scientific sampling, has been utilized. If it has not been employed, as 
sometimes happens even though the sample is stratified, the result should 
obviously not be called a stratified-random sample. The stratified samples/ 
employed in many public opinion and market research studies are in fact not/ 
stratified-mnefom samples but simply stratified samples. 

Stratification consists in the drawing of samples in such a way that the 
representation of the stratifying factors in the total sample corresponds to 
their respective proportions in the universe studied. The principle is a useful 
control technique provided^) that there is a significant correlation between 
the control factor, or factors, and the trait or behavior to be studied, and 
(2) that the necessary information about the universe is available so that the 
stratification can be based on facts and not merely on guesswork. The latter 
condition is the more difficult to satisfy in many investigations because of 




300 


SAMPLES AND SAMPUNG TECHNIQUES 


the lack of up-to-date and dependable information about the characteristics 
of the universe to be studied. 

Stratifying Factors 

In public opinion and market research investigations in which the uni¬ 
verses may be voters* preferences and opinions on public questions, or con¬ 
sumer inferences for and habits in the use of a brand commodity, etc., the 
most relevant factors for stratification are generally such attributes or charac¬ 
teristics as sex^ age, socio-economic stxitm, education, residence (urban vs. rural), 
geographical or secfIn theTJnited States, and oc cupa tion. In many 
such studies, additional factors, such as s kin c olor, natiormliiy~lxickground, 
religious affiliaiion , political party affilUition, etc., may Tie relevant for strati- 
fica tion. .. 

Diminishing Returns 

It usually is not feasible or helpful to stratify for all these various factors 
in an investigation, even when all the relevant information about the universe 
is available and stratification is possible. For one thing, the principle of 
diminishing returns operates quickly, both statistically and cost-wise. Further¬ 
more, it becomes increasingly difficult to obtain random strata-samples for 
more than a few factors because of the inter-correlation of the factors them¬ 
selves and the attendant difficulty of locating and identifying the sampling 
units when these units are people. 

The Technique of Stratification 

^ The stratification of a group of people whose attitudes or opinions are to 
constitute the universe under investigation consists, then, in drawing a series 
of random samples in such a way that the proportions of the stratifying factors 
in the sample will correspond to their proportions in the universe. If, for 
example, a researcher is to study the attitudes of the people in a city toward 
the prevailing policy of double-feature movies for the manager of a motion- 
picture theater, he can obtain a much more satisfactory sample, at the same 
unit cost, if the public is stratified for sex and age, provided there is a corre¬ 
lation between sex and such attitudes, and between age and such attitudes. 

If a preliminary test-tube sample of 100 people showed the correlation be¬ 
tween sex and attitudes given in Table 11:1, this result could well be used 
as the basis for a control factor in the main survey, since the double feature 
appeals more to women, and the single feature appeals more to men. Instead 
of one random sample taken without regard to the sex of the resi)ondents, 
two random samples would be drawn, one for each subdivision of the stratum, 
viz., one of males and one of females. Each would be drawn to a size such that 
the respective proportions of males and females in the total sample would be 
similar to their proportions in the universe whose attitudes are to be studied. 






STRATIHED-RANDOM SAMPUNG 


303 

If the proportion of males in the universe is 50%, the male stratum-sample 
would consist of a random sample equal in size to that of the female stratum- 
sample. (If the sample results were unequal in size, they could be given equal 
weights in the total result.) 


Table 11:1 

Sex V 


Male Female Hr 


Double 
Type of Feature 
Theater 
Operation 
Preferred 

Single 

Feature 


nc 45 55 N = 100 


15 

45 

30 

10 


Correlated Control Factors Reduce the Sampling Error 

By this control device, the effect of some of the error attributable to chance 
factors can be reduced. A single random sample cannot be expected to give 
a truly representative result for the attitudes of the universe studied, nor 
would such a sample ordinarily give the same proportion of males and females 
as in the universe. The chance error effects caused by too many males or 
too many females in a single random sample can be avoided. On the basis of 
the correlation shown by the above cross-tabulation of the test-tube sample, 
an undue proportion of females in a random sample would overweight the 
result in favor of double-feature movies, and too many males would over¬ 
weight the result in favor of single-feature movies. 

But, it may be asked, why should not a single random sample of the uni¬ 
verse being studied avoid the errors caused by an atypical division of the 
sexes? If, as assumed, the single sample were truly random, no constant errors 
of sampling would enter into the result, and it is these errors, as we have seen, 
that make for bias. In answer, it is to be emphasized that the failure to 
stratify for sex in the preceding example would not make for bias, so long 
as the single sample was a random sample. Rather, for a random sample of 
a given size, say 1000 cases, the effect of chance errors of sampling would be 
reduced (not eliminated) by stratifying the sample on the basis of a factor 
that correlates with the attitude or behavior to be investigated. 

Depending on the size of the sample, the variation possible in the propor¬ 
tion of the sexes in a single random sample will vary purely on the basis of 
chance. The situation here is exactly analogous to the tossing of 10 coins. 
In the long run, we would expect the results to average 5 heads and 5 tails, 
if the coins themselves were fair. On any single toss, however, we might get 




302 


SAMPLES AND SAMPUNG TECHNIQUES 


any one of 11 possible combinations, ranging from all heads, through 5 heads 
and 5 t€uls, to all tails. Purely on the basis of chance, the “long shot”— 
10 heads or 10 tails—oould occur. Similarly, if the division of the sexes in a 
universe is equal, then truly random samples should in the long run yield an 
average of 50% males and 50% females. On the basis of chance alone, a single 
random sample could, however, yield a “long-shot” result. 

The stratification of a sample on the basis of a factor that correlates with 
the trait or behavior to be studied thus reduces the variability of the error 
effects on the sample results. For a given size of sample, the results obtained 
with a stratified-random sample will therefore be more precisely representa¬ 
tive of the universe than those obtained with an unstratified random sample. 
The two controls—^the primary one of randomization, and the secondary one 
of stratification—will increase the accuracy of a result for a total sample of 
a given size. But if there is no correlation between a stratifying factor and the 
attribute or behavior, then the over-all random sample will yield a result 
that is equally as satisfactory—^but no more and no less—as the stratified- 
random sample. 

Stratified-random samples are thus a device for increasing the accuracy of 
a single random sample of a universe. For a given research budget, the re¬ 
searcher can obtain more adequate results; or, for a given size of sample, he 
can get more information at the same per capita cost. Stratification of course 
involves expenses that are not incurred in an over-all random sample, such as 
the cost of determining the divisions of a population with respect to factors 
known or thought to be relevant. Such data, however, are becoming increas¬ 
ingly available as a result of governmental and private censuses of cities, 
rural areas, counties, states, etc. Furthermore, by means of preliminary test- 
tube surveys, the possible relevance of one or more secondary control factors 
can be determined. Such preliminary surveys should be made for each specific 
investigation, so that the procedures of observation or measurement (the con¬ 
struction of the interview, the wording of a questionnaire, etc.) can be pre¬ 
tested for possible biases. 

The Inter-Relation of Stratifying Factors 

In the preceding example it was indicated that age as well as sex might be 
a second relevant factor in stratifying a sample for determining people’s atti¬ 
tudes toward double-feature movies. If the preliminary test-tube survey 
showed a result like that in Table 11:2, a random sample would also be 
relevantly controlled if it were stratified for age as well as sex^ because of the 
younger people’s greater preference for double-feature movies and the older 
people’s preference for single-feature movies. However, the two factors of 
sex and age are not independent, since all sampling units of the universe will 
be identified with both a sex group and an age group. Consequently, to stratify 
for both these factors, their inter-relation must be taken into account. That is. 



STRATIFIED-RANDOM SAMPUNG 


Table 11:2 

Age 

Under 30 30 and Over nr 


Double 
Type of Feature 
Theater 
Operation 
Preferred 

Single 

Feature 


nc 50 50 N = 100 


40 

20 

10 

30 


303 


the investigator would not proceed by drawing a random sample of 50% men 
and 50% women, and a second random sample of 50% “young” and 50% 
“old,” according to the proportions of two such dichotomized age groups in 
the universe. Rather he will stratify for both factors simultaneously and draw i 
four random strata-samples, one for each combination of the two factors, as 
indicated by the cells in Table 11:3. Such a division of the two combined 


Table 11:3 

Sex 


Age 



Males 

Females 

30 and Over 

25% 

25% 

Under 30 

25% 

25% 

%c 

50% 

50% 


50% 


50% 


factors, sex and age, into/our equaUsized groups would necessarily be based on 
knowledge that the sampling units (people) of the universe to be studied 
consisted of 50% males and 50% females, and that half of each of these groups 
were 30 years of age and over. In practice, such an even distribution of these 
two factors would not be typical for many universes. 

The importance of defining the universe to be studied prior to the investi¬ 
gation was emphasized earlier, and is well illustrated by the above example. 
The researcher doing the study of people’s attitudes toward double-feature 
movies for the motion-picture operator would not include everyone in the 
city in the universe to be sampled; he would eliminate infants and young 
children (perhaps under the age of 10) as well as the incapacitated. On the 
other hand, he would not restrict the universe solely to those who attend 





304 


SAMPLES AND SAMPUNG TECHNIQUES 


movies regularly, because such a restriction would eliminate the people who 
might go more regularly if double features were eliminated. The sampling 
units for the universe might thus consist of anyone over 10 years of age, 
except the bedridden and those otherwise unable to attend theaters. 

The Inter-CorreloUion of Stratifying Factors 
Preliminary analysis of test-tube results often makes it possible to deter¬ 
mine rouglily whether the addition of control factors is likely to increase the 
representativeness and precision of a sample result, or whether a particular 
factor may be as adequate for control purposes as would its combination with 
another factor. How such an analysis is made is illustrated by Table 11:4, 

Table 11:4 

Total Male 

Male Female and Female 

Younger Older itr Younger Older ttr nr 


45 60 


10 40 


Pc 22 23 45 28 27 55 N 100 

showing cross-tabulations of the hypothetical test-tube data for sex and age 
differences in attitudes toward double-feature movies. From this analysis 
with respect to both age and sex, it is evident that sex is probably a more 
relevant control factor than age, because a greater proportion of the females 
than males, regardless of age, prefer double features. It does not follow that 
age is entirely irrelevant as a control factor, as can be seen from the rearrange¬ 
ment of the same data in Table 11:5. In the older group, the correlation 


Thuater 

Oparation 

Prufurrud 


Double 

Feature 

13 

2 

15 

27 

18 

Single 

Feature 

9 

21 

30 

1 

9 


Table 11:5 


Youngar 


Older 


Total of 
Young and Old 


Type of 
Theoter 
Operation 
Preferred 



Male 

Female 

nr 

Male 

Female 

Double 

Feature 

13 

27 

40 

2 

18 

Single 

Feature 

9 

1 

10 

21 

9 

Pc 

22 

28 

50 

23 

27 


Hr 


60 


40 

N«100 







STRATIFtE[>-RANDOM SAMPUNG 305 

between sex and preference is high, the women preferring double features 
in a ratio of 18 to 9, and the men preferring single features in a ratio of 21 
to 2. In the younger group, the women and girls prefer double features in a 
ratio of 27 to 1, but the men and boys also prefer the double feature, although 
the ratio is only 13 to 9. 

Thus both age and sex would be relevant control factors in this investiga¬ 
tion, although the principle of diminished returns operates when the age 
control factor is added to the sex factor. 


Sub-Universes in Stratified-Random Sampling 

Whatever the control factors used in stratifying a universe, and conse¬ 
quently in taking a random sample of that universe, the procedure of stratified- 
random sampling consists in sampling several independent parts of the 
whole, or universe, being studied. The whole is not randomized directly; 
rather, it is sampled by means of a number of random samples. Each part 
sample is a random sample of the members of a given cell, or multiple class— 
for example, all women over 35 years of age. The result obtained from each 
such sample is for a part of the whole. Hence, each random sample of an 
independent part of a universe may be called a random sample of a sub¬ 
universe. The process or technique of stratified-random sampling can there¬ 
fore be looked upon as building up a sample for a universe by means of 
bringing together the results of a series of random samples for the sub¬ 
universes that make up the whole, each such part sample being appropriately 
weighted to yield a single sample result for the universe. 

The sampling units of some sub-universes are relatively easy to identify, 
segregate, and sample independently of one another; this is true of universes 
that are stratified for geographical location or socio-economic status by 
residence areas. With other sub-universes, however, this is extremely diffi¬ 
cult. For example, the four groups in the preceding example—older women, 
older men, younger women and girls, and younger men and boys—are inter¬ 
mingled in real life and cannot readily be approached as independent sub¬ 
universes of the whole. In such a case, the use of additional control factors in 
stratification is likely to complicate further the problem of locating the 
sampling units in each sub-universe. 

When the sampling units are people, stratified-random sampling is par¬ 
ticularly difficult to apply for more than but a very few control factors at a 
time. Where, for example, can a researcher locate the sampling units of a 
sub-universe defined as “white, female, white-collar workers, over 35 years 
of age, with a high school education, affiliated with the Democratic party, 
of the B socio-economic group, and residing in area X”? Such a sub-universe 
is not distributed in a manner that segregates it from other sub-universes, 
nor are lists of the members of such sub-universes usually available. As 
pointed out earlier, the principle of diminishing returns would make it un- 



306 


SAAAPLES AND SAMPUNG TECHNIQUES 


necessary to sample such a highly restricted sub-universe, even if all the 
factors mentioned were correlated with the trait or behavior to be studied. 

Internal Controls in Sampling 

In practice, when making the preliminary layout of his investigation, the 
researcher stratifies for very few factors if people are the sampling units. 
These factors are primarily external—density of population, geographical 
section or area, and residence—^the latter especially in relation to socio¬ 
economic status. Sometimes, in city sampling, sections or neighborhoods are 
also stratified for nationality or cultural background when the city contains 
neighborhoods which have a relatively homogeneous concentration of people 
for this factor—“Little Italics,” etc. 

While he is obtaining a random sample of observations or measurements, 
the researcher brings in internal controls by determining not only the re¬ 
spondents* attitudes or opinions or consumer habits, but also their sex, age, 
occupation, education, etc. This additional information can be checked against 
the known distributions of these factors in the universe. By comparing the 
characteristics of the sample with the known characteristics of the universe, 
the investigator can determine whether he has in fact obtained a fairly typical 
cross-section as regards such factors as sex, age, education, occupation, etc. 
Where correlations are found between one or more of these factors and the 
trait or behavior to be studied, he can make adjustments in atypical samples 
for the purpose of reducing (not eliminating) the sampling error. 

The possibilities of making such an adjustment by an analysis of a sample 
result are well illustrated by the Literary Digest poll during the presidential 
campaign of 1936. Each respondent in the mail-ballot poll of telephone sub¬ 
scribers and automobile owners was asked to indicate whom he had voted 
for in 1932. As Cornfield has pointed out,* it was possible to use the published 
Literary Digest results and predict the likelihood of a Roosevelt majority. 
The published results showed that 52% of those responding to the poll and 
reporting on how they had voted in 1932, had voted for Hoover; however. 
Hoover received only 41% of the vote that year. Thus the sample was heavily 
overweighted with 1932 Hoover voters, who in turn were overweighting the 
1936 poll in favor of the Republican candidate. 

Areal Sampling 

Not aU sampling problems are as difficult as those that arise in public 
opinion and market research surveys where the sampling unit is the citizen, 
the voter, the housewife, etc. Scientific sampling techniques to control the 
quality of production have come into widespread use in many manufacturing 
industries in recent years. The basic problem here is to obtain a random 

* J. Cornfield, **On Certain Biases in Samples of Human Populations,** Journal of the 
American Statistical AssociaHon^ 37163-68,1942. 



StRATIRED-RANDOM SAMPLING 307 

sample of the manufactured item as it comes off the production line. Similarly, 
the problem of developing norms for psychological tests is today rarely 
approached from the point of view of trying to obtain representative samples 
of the performance of all adults in the United States, of all 16-year-olds, etc. 
Rather, the universes are varied according to the particular purpose for 
which the test is designed. Thus, if a test is to be used in a given industrial 
situation, the norms will be developed on the basis of a random or stratified- 
random sample of test results derived from workers in that situation. 

The fact remains, however, that a great deal of the empirical progress that 
has been made during the past ten years in sampling statistics in the social 
sciences has been in the fields of public opinion and market research. How 
are the complications of random sampling and of stratified-random sampling 
being surmounted in these fields? A most hopeful sign of real progress is the 
development of areal sampling, a method which, for public opinion and 
market research, promises to approximate most closely the strict conditions 
imposed by randomization in sampling. 

Whether the sampling unit be people, particular kinds of people, family 
units, dwelling units, homes, or farms, the essential characteristics of areal 
sampling are as follows: (1) The minor civil units of a city, county, state, 
or nation are divided into small area units that are exclusive of each other. 
(2) Each sampling unit is associated with only one such area unit. (3) Some 
of the relevant characteristics of the sampling units of the area are known. 
Hence (4) a sample of areas can be used to establish stratified-random samples, 
for a city, county, larger rural area, state, geographical section, or an entire 
nation. 

The Bureau of the Census and the Bureau of Agricultural Economics have 
for some time given serious attention to the compilation of information on 
the relevant characteristics (or factors that can be used as secondary controls 
in stratified-random sampling) of small areas.* 

On the basis of the 1940 census data (much of which is now out of date). 
Block Summary Cards for the 191 U.S. cities with 50,000 or more inhabitants 
in 1930, and E.D, Summary Cards for each of the 154,000 Enumeration 
Districts of the Census not included in the areas covered by the Block Sum¬ 
mary Cards, were made available several years ago for use in sampling. The 
E.D. Summary Cards are identified by state, county, and minor civil division 
and include such information for each Enumeration District as the following: 
total population (usual range, between 500 and 1500), native white popula¬ 
tion; percentage native white male; Negro population and percentage male; 
number and percentage farm population and total number of farms; total 
number of dwelling units; number and percentage of owner-occupied dwelling 
units; and average rent per dwelling unit (including estimated rental values 


* M. H. Hansen and W. E. Deming, “On Some Census Aids to Sampling,*’ Journal of 
the American Statistical Association^ 38:353-357, 1943. 



308 


SAMPLES AND SAMPUNG TECHNIQUES 


of owner-occupied units). The Block Summary Cards identify the location 
of each block in the 191 cities and give such information as number of struc¬ 
tures; total dwelling units; vacant and occupied dwelling units; owner- 
occupied and tenant-occupied dwelling units; dwelling units occupied by 
non-white household; and average monthly rent per dwelling unit. The city 
block is usually smaller than the Enumeration District and is conseqpiently a 
better areal unit for general sampling purposes. 

When such information as the preceding is up to date, a sample can be 
stratified not only with respect to geographical section, urban and rural 
communities, and density or concentration of dwelling units and of popula¬ 
tion, but also for socio-economic status in terms of average monthly rental; 
for white, Negro, and mixed area imits; for tenants and owners of dwelling 
units, etc. Within each stratum, a random sample of blocks or Enumeration 
Districts can be drawn systematically by means of random numbers or a 
tested lottery device, and the sampling units (people, family units, etc.) 
within each of the area imits can be interviewed. Or sub-samples of the 
family units, etc., within the area units may be drawn by a random method. 
The latter procedure is particularly practical for area units with a large number 
of sampling units, especially if they are fairly homogeneous in the relevant 
stratifying characteristics. 

If the research problem were to determine the distribution of television 
sets in dwelling units, all the dwelling units in each of the areas (blocks and 
Enumeration Districts), stratified and drawn in the random sample, would 
constitute the basic sampling units. The results could be weighted for such 
known control factors as concentration of dwelling units and average rental, 
as well as analyzed in terms of such factors. If voters’ opinions on public 
questions or governmental policies were to be studied, opinions could be 
obtained from all the qualified voters in the stratified-random sample of 
areas drawn for the survey. 

The maintenance of up-to-date information on scores of thousands of area 
units in the United States obviously poses something of a census problem as 
well as a financial problem. The project is too costly for any private organiza¬ 
tion to undertake; but because of the value of such information for scientific 
sampling in market research, private business and industrial organizations 
might well consider supporting such a project on a subscription basis. The 
addresses, names, ages, and sex of everyone residing in each areal unit, if 
kept up to date on an annual basis, would be particularly valuable supple¬ 
mentary information to that already considered. Furthermore, when people 
constitute the basic unit of sampling, the technique of randomization by 
tables of random numbers or a lottery device can be applied to the people 
themselves.* 

* Cf. J. N. Webb, M. S. Northrop, and S. L. Payne, “ Practical Applications of Theoreti¬ 
cal Sampling Methods,*’ Journal of the American Statistical Association, 38x69-77, 1943. 
These authors would perhaps consider an annual enumeration of all small areal units in 



STRATIFIED-RANDOM SAMPUNG 


309 


There is a possible source of error in the researcher’s knowing the respond¬ 
ents’ names and addresses, especially in public opinion research. As Stock 
has pointed out, “There is evidence that people do not always give their 
true opinions even to a stranger who does not know their names or addresses 
and has no way of checking up on them again.’’ * Errors that arise from 
non-anonymity are errors of measurement rather than of sampling, and can 
perhaps be overcome by a secret-balbi technique.^ 

The Technique of the Master Sample 

Areal sampling on both the extensive scale (for the United States as a whole) 
and the intensive scale (by Enumeration Districts and city blocks) is for the 
future. In the meantime, the foundation for a technique for agricultural 
surveys has been laid in the development of the Master Sample, suggested in 
1943 by Rensis Likert of the United States Bureau of Agricultural Economics.! 

The Master Sample technique attempts to achieve the advantages of small- 
unit areal sampling by means of a cross-section of areal samples for the universe 
to be studied, rather than by the subdivision of the complete universe into 
areal units. The Master Sample developed for agricultural surveys consisted 
in 1945 of 67,000 sample areas, selected from practically all the 3070 counties 
in the United States. They averaged about 2.5 square miles and about 5 farms 
per areal unit. Their actual size varies with the density of the population; 
they are largest in Nevada, where they average 108 square miles, and smallest 
in Indiana, where they average 0.71 square mile. The total 67,000 sample 
areas contain 1/18 of the land area of the United States, 1/18 of the farms 
(about 300,000), and 1/18 of the population. Thus the Master Sample is a 
cross-section sample that is equal to 5.5% of the whole. It is stratified for the 
civil character of the areas—Incorporated, Unincorporated, and Open 
Country—as well as for density of population. By means of aerial photo¬ 
graphs of all the counties, work maps have been set up on which each areal 
unit of the Master Sample is identified in relation to many more landmarks 
than can be indicated on an ordinary map. In fact, the initial definition of 
each areal unit was considerably facilitated by the use of aerial photographs. 

The data of the 1945 Agricultural Census will be drawn upon to establish 
relevant characteristics for the farms and people of each area in the Master 
Sample. But, even without advance knowledge of the population character- 

the United States both extremely impractical and unnecessary. Certainly such a program 
would be impractical unless, as intended, the data were conveniently available on machine 
cards for use by many government agencies as well as private organizations. The Master 
Sample plan, described in the next section, may prove to be sufficiently satisfactory to con¬ 
stitute an adequate body of information for all purposes. 

* J. S. Stock, “ Some General Principles of Sampling,’* chap. 10 in Hadley Cantril, 
op, ciL, p. 139. 

tW. Turnbull, “Secret vs. Nonsecret Ballots,” i6id., chap. 5. 

t A. J. King and R. J. lessen, “The Master Sample of Agriculture,” Journal of the 
Atnerican Statistical Association, 40:38-56, 1945. 




310 SAMPLES AND SAMPUNG TECHNIQUES 

istics, the Master Sample can be used for many research problems in agricul¬ 
tural economics. Plantings, estimates of yield, and actual yield for various 
types of crops can be calculated quickly and efficiently. 

The Master Sample technique will undoubtedly be extended for use in all 
types of surveys, especially labor statistics, and may well prove a satisfactory 
alternative to areal sampling of the total population. If it is generally satis¬ 
factory from a sampling point of view, particularly with respect to the avoid¬ 
ance of biasing factors, the Master Sample plan will certainly be more efficient 
per unit cost. As in total areal sampling, it has the advantage of eliminating 
freedom of choice on the part of the field workers, for the sampling for an 
investigation can be completely designed in advance in the central office by 
means of appropriately stratified and randomized techniques. 

The Random-Point Method of Sampling 

The random-point technique of sampling, sometimes included in stratified 
sampling methods, consists in the random location of points on a map from 
which the sampling units are chosen, so many of the nearest households, 
farms, people, etc., being taken as the sample. The biases that may enter 
are unfortunately difficult to control. It is difficult, for example, to design a 
truly random method for selecting the specified number of sampling units 
at each point on the map. And, as was said earlier, a random selection of 
geographical units whose basic characteristics are unknown may result in an 
over-concentration of certain characteristics that will make for bias in the 
results. The stratification of factors among geographical units, as in areal 
sampling, is practically a “must” type of control for any sampling procedure 
in which the initial sampling units are located on a map. 

The Stratified-Quota Method of Sampling 

The stratified-quota method is the sampling technique that has been suc¬ 
cessfully employed by Gallup and others in public opinion and market research 
studies. It consists essentially in the stratification of the universe to be studied 
so that the people in the sample will be in the same proportion as they are in 
the total population sampled. Four initial strata controls are generally used 
in a national sample: geographical section of the United States, rural-urban 
distribution, economic status, and color of respondent. Additional strata 
controls, such as age and sex, are usually employed during the interviewing 
process itself. Each interviewer in a specified area is given a definite quota of 
interviews for each cell of the inter-related strata. Thus a quota may be “ten 
white men, over forty, in the highest income group.” The interviewer attempts 
to avoid any biases in selecting these ten men, and the persons in the other 
cells in his quota schedule. The success of this technique depends to a great 
extent upon the interviewer’s ingenuity in avoiding biasing factors when he 
selects the people to be interviewed. 



STRATIFIED-RANDOM SAMPUNG 311 

This method is an approximation of the stratified-random sampling tech¬ 
nique, and is often so characterized because of the efforts to obtain random 
samples of people in filling the quotas. Each quota for an area is set up so that 
its sample proportions correspond to the sub-universe proportions. If 5% of a 
sub-universe are in the highest socio-economic group, then 5% of the quota 
sample will include members of this group, or the sample result will be weighted 
according to this proportion. 

As in stratified-random sampling, the feasibility of stratified-quota sampling 
is dependent upon knowledge about the distribution of the stratifying factors 
in the universe to be studied. The stratified sample cannot be a “typical 
cross-section” (the usual descriptive phrase) unless the distribution of such 
factors as population concentration in the major geographical sections and 
in rural and urban areas, socio-economic status, color, sex, age, etc., is known 
for the universe. An internal control technique is also usually employed by a 
comparison of additional characteristics of respondents, obtained during the 
interview, with their known distribution in the universe; such characteristics 
include political party affiliation, occupation, education, nationality, etc. 

That the stratified-quota method has been generally successful in the pre¬ 
diction of election returns is empirical testimony to its usefulness. It has the 
advantages of being relatively inexpensive and of quickly yielding the sample 
result. It is not, however, as fool proof a sampling method as is strict random¬ 
ization of the basic sampling units of a universe or of a series of sub-universes 
established by stratification. 

Chief Source of Error in Stratified Sampling 

The chief source of error in stratified sampling is the failure to obtain a 
truly random sample of observations or measurements for each control 
stratum of the universe sampled. We cannot overemphasize the fact that 
randomization is the primary control factor in all sampling. Stratification is a 
secondary control factor, even though random samples are often referred to 
in the literature as “simple samples,” whereas stratified samples are commonly 
called “controlled samples.” 

We have seen that randomizing a population by a lottery method which 
guarantees the randomness of the sample is often not feasible, and that 
stratified-random sampling might be utilized in such a situation, with even 
more satisfactory results. But, we may ask, if we have to draw a random 
sample for each stratum, why not just draw one large random sample at the 
beginning and be done with it? The answer is that in consumer and public 
opinion research, and similar fields, it is usually easier to apply to the sampling 
the secondary control, stratification, than the primary one, randomization. 
We randomize the best we can; and to the extent that we fail, we hope that 
stratification will compensate for the errors. 

It is because of this compensation effect that the measures of chance 



312 


SAMPLES AND SAMPUNG TECHNIQUES 


error in stratified-random sampling are only infrequently adjusted to indicate 
the greater accuracy or precision that should characterize samples of a given 
size. The usual estimates of error employed in random sampling are also used 
in stratified-random sampling in the hope that the stratification has compen¬ 
sated for any failures in true randomization. Only when the method used to 
randomize stratified samples guarantees randomness within each stratum are 
we justified in reducing the statistical estimates of chance error in the results. 
This, however, does not preclude making an internal analysis of control 
factors and results in order to eliminate possible biases, or constant errors. 

The ‘‘Representativeness’* of Stratified Samples 

Current terminological emphases often imply that a well-stratified sample 
is necessarily representative of the universe sampled. Unfortunately, con¬ 
fusion in interpretation has arisen in this regard. We earlier defined a repre¬ 
sentative sample as one that is a replica, at least to a satisfactory degree, of 
the universe sampled. This representativeness refers to the trait or behavior 
being studied, and not to some other factor or characteristic. A sample of 
consumer purchases of a given product is representative of the universe 
sampled if the sample result corresponds to the consumer purchases of the 
product made by all the members of a universe. 

Yet, because a stratified sample is chosen so as to be a cross-section of a uni¬ 
verse in regard to several control factors, it is often referred to as a “repre¬ 
sentative sample.” The confusion arises from the use of “representativeness” 
in both these situations. The stratified sample may indeed comprise sampling 
units that are a typical cross-section of the universe in secondary control 
factors such as residence, socio-economic status, age, sex, etc. But it does 
not necessarily follow that the sample result will be representative (witliin 
measurable units of error) of the universe. Even when correlation is known 
to exist between the secondary control factors of stratification and the trait 
or behavior being sampled, we cannot be as confident of the representative¬ 
ness of the sample result as when randomization has been employed in select¬ 
ing the sampling units. 

It would be better to call a stratified sample a typical cross-section of the 
sampling imits of the universe, and then describe the control factors used in 
the stratification, than to refer to such samples as representative samples. 

Finally, it should be emphasized that random-sampling is not necessarily 
inferior to stratified-random sampling. The latter is often markedly superior 
per unit cost; and it is more efiicient for a given size of total sample, particu-. 
larly when the sample is relatively small in relation to the size of the universe. 
However, when a random sample of a universe is feasible, these advantages 
of stratified-random sampling diminish, particularly if large samples are 
employed, since the precision of a result varies little in large samples. 



SOME FURTHER CONSIDERATIONS ABOUT SAMPLING 313 

R SOME FURTHER CONSIDERATIONS ABOUT SAMPLING 
Precision and Adequacy in Sampling 

In the preceding discussion of random and siratified-random sampling we 
referred to the precision and adequacy of samples, as well as to their repre¬ 
sentativeness. These terms as used in sampling statistics have rather specific 
meanings. We earlier emphasized that the representativeness of a sample 
is rather a matter of degree than of all-or-none, and that the sample is repre¬ 
sentative of the universe sampled if only chance errors appear in the result. 
If the sampling techniques employed warrant confidence in the likelihood 
that a sample result is random for a particular universe, the degree to which 
it is representative is denoted in terms of the precision of the result. 

The precision of a sample result is evaluated in terms of the extent to which 
any measure derived from it agrees with the value of that measure for the 
universe sampled. Thus, if a universe mean is 50.0, the mean of one random 
sample of that universe is 45.0, and the mean of a second random sample is 
49.0, both sample means are representative of the universe mean in that they 
are derived from random samples of that universe, and the differences can 
therefore be expected to occur on the basis of chance errors of sampling. The 
mean of the second sample, however, is a more precise measure of the uni¬ 
verse mean than is that of the first sample; hence, the second mean is a more 
representative measure. 

The adequacy of a sample result, on the other hand, is based on the concepts 
of both representativeness and precision. Adequacy is a function of a particu¬ 
lar investigation. Thus, a random sample of clerical workers in a large busi¬ 
ness organization may give results on a clerical efficiency test that will con¬ 
stitute a sample sufficiently representative and precise for that universe to 
be an adequate basis on which to develop test norms for all the clerical 
workers in that organization, but which will be inadequate for the develop¬ 
ment of test norms for clerical workers generally, or in another business 
organization. 

It should be evident that the adequacy of a sample result is contingent 
upon the methods used in sampling a given universe. Random samples and 
stratified-random samples of a given universe should always yield sample 
results that are to some degree representative of that universe. Precision, 
however, is a function of the size of the sample. Measures of the precision of a 
result in sampling statistics are developed on the assumption that only 
chance errors affect the result. The precision ordinarily increases in propor¬ 
tion to the square root of the increase in the size of the sample (cf. Chapter 12, 
Section E). The effect of increasing the sample size is to reduce the effect of 
chance errors of sampling. 

A sample result based on very few observations or measurements may 
not be adequate because it is likely to lack the precision necessary for an 



314 


SAMPLES AND SAMPUNG TECHNIQUES 


investigation. Likewise, it is impossible for a biased sample to be adequate, 
for it cannot be representative of the universe it was intended to sample. 

It should be evident from the foregoing that an attempt to evaluate the 
precision of an unrepresentative or biased sample is analogous to testing the 
sharpness of pieces of steel that have not yet been ground into knife blades. 
The first requisite in evaluating sample results is that the sampling and 
measurement techniques used are such as will yield a representative result 
for the universe. Once such results are obtained, they are then statistically 
evaluated for precision. If a representative sample is sufficiently precise to 
satisfy the purposes of the investigation, it can be characterized as adequate. 

This relationship between the representativeness, precision, and adequacy 
of a sample result is analogous to that between the validity, reliability, and 
adequacy of a psychological test (cf. Chapter 17). If a test measures or pre¬ 
dicts the kind of behavioral functions it is intended to measure or predict, it is 
said to be a valid test of those functions (at least to some degree). If, further¬ 
more, it differentiates with a relatively high degree of accuracy the extent to 
which a group of people manifest such functions, it is said to be a reliable 
test. Hence it will be adequate for the intended purpose. Thus, if an intelli¬ 
gence test is found empirically to predict the success and failure, at least to 
some degree, of children in their later academic work, it may be considered 
a valid test of academic aptitude. If, furthermore, the predictions made from 
the test results hold with but few exceptions (or errors of prediction), the 
test is highly reliable. And the more reliable a valid test is, the more adequate 
it is. But the converse is not true; i.e., it does not follow that an invalid test 
will become more adequate for the intended purpose if its reliability is in¬ 
creased. The latter would be analogous to attempting to increase the adequacy 
of a biased sample merely by increasing its size. 

The Character of Samples vs. the Size of Samples 

These considerations of representativeness, precision, and adequacy in sam¬ 
pling, as well as our earlier discussion of the importance of randomization 
in scientific sampling, should make it evident that the character of a sample 
is more important than its size. Not that both are unimportant; rather, 
character denotes the stuff out of which samples are made. However, if the 
blend is wrong, the character cannot be changed simply by increasing the 
size of the sample. 

The character of a sample, whether or not it is representative of the uni¬ 
verse to be studied, ordinarily must be evaluated in terms of (1) a knowledge 
of the methods used to obtain it, or (2) the extent to which it predicts the 
behavior expected of the universe, or both. 

Once we are warranted in assuming that the character of a sample is satis¬ 
factory, we can consider the size of sample necessary for the degree of pre¬ 
cision that will be adequate for the purposes of our investigation. If, in a poU 
of voters’ preferences for candidates A and B, we employ a technique that 



SOME FURTHER CONSIDERATIONS ABOUT SAMPLING 


315 


we can be certain yields a random result for the universe sampled, we 
must determine how large a sample we need for a result in which we can 
have sufficient confidence to make a prediction. We can have more confi¬ 
dence in the results obtained with a sample of a given size if 75% favor candi¬ 
date A and 25% candidate B, than if 52% favor candidate A and 48% candi¬ 
date B. In other words, the size of sample required for such an investigation 
depends upon the closeness of the poll result. Larger samples are required 
in order for smaller differences to be significant. 

When the universe is finite, a random sample of a given size will yield a 
more precise result for a small than for a large universe. Similarly, the more 
heterogeneous the character of the universe to be sampled, the larger will be 
the sample required; and conversely, as in the case of the drop of blood, the 
more homogeneous the character of the universe, the smaller will be the 
sample required. 

The possible usefulness of relatively small but carefully stratified samples 
in public opinion research is illustrated by the following research from Can- 
tril.* The Office of Public Opinion Research at Princeton University attempted 
to predict the outcome of the New York gubernatorial election in 1942 by 
means of a sample of only 200 voters in that state who were selected to 
provide a cross-section of the voters with respect to color, economic status, 
and age. All the interviewing was done by one person in the week before 
election day, the interviews being limited to registered voters. Table 11:6 
shows a comparison of the OPOR results with actual election results and with 
the larger samples used in the American Institute of Public Opinion poll and 
the New York Daily News poll. 


Table 11:6 



Gubernatorial Candidates 

Number of 
Cases 

Dewey 

Bennett 

Alfange 

Actual election results 

53% 

37% 

10% 

4,112,000 

OPOR prediction 

58 

36 

6 

200 

AIPO prediction 

53 

39 

8 

2,800 

N.Y. Daily News prediction 

57 

37 

6 

48,000 


Thus the carefully stratified OPOR sample of 200 voters yielded a pre¬ 
diction almost as precise as the large sample of 48,000 voters used in the 
Daily News poll. Much more consideration was given to the character of the 
small sample than the very large. It will be noted that Gallup’s poll for the 
AIPO, in which the quota sample was carefully stratified but was 14 times 
larger than that used for the OPOR, predicted the percentage of Dewey’s 
vote without error. 


Op. ci7., chap. 12, “The Use of Small Samples.” 











316 


SAMPLES AND SAMPUNG TECHNIQUES 


It must not be concluded from the foregoing example that a small sample 
of 200 cases for a universe of more than 4 million will generally yield such an 
adequate result. Rather, this example illustrates the relatively greater im¬ 
portance of the character of a sample as against sheer size. 

Accidental Samples 

The accidental in sampling is often confused with randomness. If a sample 
is drawn in an unplanned or haphazard fashion, if the cases that happen to 
be conveniently available are used, the sample is most likely to be an acci¬ 
dental, not a random, sample. The technique for the latter type of sample, 
as we have seen, usually requires a great deal of systematic planning and 
control in order to guarantee that each member of the universe to be studied 
will have an equal chance of appearing in the sample. 

Accidental samples are ignorant samples; they represent no known uni¬ 
verse. The methods of sampling statistics have no logical application in an 
analysis of the results of accidental sampling. Yet this non-sampling device 
(it really should be characterized negatively, since it is not a technique of 
sampling) has been all too prevalent in academic research. McNemar * 
emphasizes this point ironically in his reference to the samples so often used 
for doctoral dissertations, test standardizations, etc.: “It is here that the 
college sophomore has an advantage in being the raw stuff out of which 
psychologists build a science of human behavior.” 

Following his work with Wundt in the “new experimental psychology” 
at Leipsig during the last quarter of the 19th century, James McKeen Cattell 
began to pay some attention to the facts of individual differences. Cattell 
was not satisfied with the attempt to develop a science of mind by means of 
intensive observations and measurements with one or two students as sub¬ 
jects, the characteristic approach of the Wundtian experimentalist. As a 
result, the number of subjects was increased. But only recently have psycholo¬ 
gists and other social scientists generally realized that sheer numbers are often¬ 
times not sufficient. The character of the sample is of prime importance. 
Therefore accidental samples, being without character, have no place in 
scientific research. 

Restricted Universes and Partial Investigations 

The importance of defining the universe to be sampled has already been 
emphasized. Adequate samples are not in themselves ever restricted, but they 
may be samples of restricted universes. Thus, in standardizing psychological 
tests for personnel work in business and industry, the empirical process of 
trial and error has made it clear that there is little sense in trying to sample 
the generality of mankind (least restricted universe), or of adults in the 
United States (somewhat more restricted), or of all adults in industrial work 

*Q. McNemar, ** Sampling in Psychological Research,** Psychological Bulletin, 37t331- 
365, 1940. 



SOME FURTHER CONSIDERATIONS ABOUT SAMPUNG 


317 


(even more restricted), , . . etc. Rather, the relatively restricted universes 
of a given business or industrial organization are used to develop the norms 
that will prove most useful for its practical problems of employee selection, 
upgrading, etc. 

The extent to which a universe is restricted is thus relative. The universe 
of housewives in Colorado is relatively restricted as compared with the uni¬ 
verse of housewives in the United States; but compared with the universe of 
housewives in Boulder, Colorado, it is relatively unrestricted. 

A series of observations or measurements drawn from the lower or upper 
part of a distribution derived from part of a universe is sometimes referred to 
as a restricted sample. The high T.Q. children in a school population would 
constitute such a group. However, it is better not to employ the concept 
sample in this connection unless the universe to be studied is similarly re¬ 
stricted. Such restricted parts of a whole are better described as partial 
groups, and the study of their traits and behavior as partial investigations. 

The Analysis of Intra-Group Differences in Sampling 

The analysis of similarities and differences in the sample results of sub¬ 
universes, particularly in public opinion and consumer research surveys, 
often gives valuable information as well as considerable insight into the basis 
for the character of the total result. This is why additional information about 
the characteristics of respondents is obtained during interviews. Such informa¬ 
tion not only serves the purpose of determining whether a sample is a typical 
cross-section in certain respects, but also provides relevant data for the 
analysis of intra-group differences. 

The Gallup poll on the loan to England, mentioned earlier, made the follow¬ 
ing breakdown of the results for three characteristics of the respondents, viz., 
occupation, education and political party affiliation: 


"England plans to ask this country for a loan of three to five billion 
dollars to help England get back on its feet. Would you approve or 
disapprove of the United States making such a loan?" 



Approve 

Disapprove 

No Opinion 

Total 

National sample result 

27% 

60% 

139b 

( 1009 fc) 

By occupation 

Business and professional 

37% 

55% 

8% 

(100%) 

White collar 

35 

54 

11 

(100 ) 

Farmers 

26 

62 

12 

(100 ) 

Manual workers 

20 

65 

15 

(100 ) 

By education 

College 

45% 

50% 

5% 

(100%) 

High school 

32 

58 

10 

(100 ) 

Grammar and no school 

22 

63 

15 

(100 ) 

By party 

Republicans 

28% 

62% 

10% 

(100%) 

Democrats 

28 

59 

13 

(100 ) 


318 


SAMPLES AND SAMPUNG TECHNIQUES 


Apart from the over-all result, it is evident that at least a majority of each 
of the three strata groups (occupation, education, and political party 
affiliation) and the subgroups within each stratum disapproved the loan. 
Republicans and Democrats alike disapproved in a ratio of more than 2 to 1; 
in other words, “politics” did not enter into the result. However, there were 
rather marked differences in the opinions of those with a college education 
and those with less education; of manual workers and white-collar workers; 
of farmers and those in business and professional occupations. The more than 
3 to 1 opposition of manual workers is particularly interesting in view of the 
fact that England’s Labor government was seeking the loan for the purpose 
of improving the living conditions of “the working classes.” 

We have already seen that cross-tabulation of the data of different sampling 
strata reveals the relative importance of the relations between the stratifying 
factors and the opinions or behavior studied. Although the above breakdowns 
of Gallup’s results are not in a form suitable for analysis by correlation— 
unless it can be assumed that the sizes of the subgroups of a stratum are 
equal—the percentage comparisons of the opinions of each subgroup indicate 
no correlation between political party affiliation and attitude toward the 
loan, some correlation between occupation and attitude, and more marked 
correlation between education and attitude. From the discussion of stratifica¬ 
tion, it should be clear that occupation and education are relevant for strati¬ 
fication because of the correlation of these characteristics with the respondents’ 
attitudes toward the proposition. Political party affiliation would not be 
relevant for stratification in this particular case. However, for a question of 
such public importance and one that would require congressional action, the 
knowledge as to whether there are or are not differences in the opinions of 
Republicans and Democrats is relevant. 

It is not possible, from these data as presented, to determine whether there 
is much or any correlation between education and occupation of the respond¬ 
ents. Presumably there is some, because of the usually greater incidence of 
the high-school and college educated among the white-collar and professional 
workers and the lower incidence of respondents with such education among 
farmers and manual workers. The divisions of attitudes among the occupa¬ 
tional and educational strata are certainly not incompatible with the possi¬ 
bility of such correlation. 

Finally, in analyzing intra-group differences in sample results, it should 
be emphasized that not only the character but also the size of each such part 
of a sample must be considered if the differences are to be taken seriously. 
Subdivisions of a stratum that include only a few cases may be inadequate 
for satisfactory precision for the particular result. 

Sampling in the Experimental Method of Equated Groups 

The discussion of sampling in this chapter has been oriented primarily 
toward the problem of sampling universes in an effort to study them and 



SOME FURTHER CONSIDERATIONS ABOUT SAMPUNG 


319 


ascertain their nature. Sampling has been developed as a useful alternative to 
a census of finite and existent universes, and is the only possible technique for 
the study of infinite and hypothetical universes. Problems of sampling, how¬ 
ever, arise not only in studying the nature of universes and in analyzing intra¬ 
group differences, but in using experimental method. Whether the experi¬ 
mental problem is in the field of psychophysics or involves determining the 
possible effect on working efficiency of such a factor as method of work, 
frequent rest periods, a financial-incentive plan, or the effect of a drug, con¬ 
sideration must be given to the character of the sample, not merely its size. 
A particular method of work may improve the efficiency of some people but 
not others. Homo sapiens, in general^ has a brain: he thinks and remembers; 
he sees, hears, feels, tastes, and smells. But far from functioning homo¬ 
geneously in these respects, homo sapiens is essentially variable in the extent 
to wliich and the manner in which he utilizes his psychological and social 
capacities. 

The only empirical means of determining the extent to which a psycho¬ 
physical principle holds for people in general is to test it with many different 
kinds of people—with men and women, with individuals of different chrono¬ 
logical ages, etc. 

Determinations like the preceding can often be made by the use of inde¬ 
pendent samples, i.e., samples chosen from different universes or sub-universes, 
as well as from the same universe in the replication of an experiment. Inde¬ 
pendent samples are those chosen in such a way that the selection of the 
units in one sample is in no way affected by the selection of the units in the 
other. Thus two or more random samples are, by definition, independent 
samples. 

Samples matched to establish equated groups, on the other hand, are 
dependent samples. They are of great value in many experimental problems, 
particularly in controlled experimentation, which requires the use of a control 
group and of one or more experimental groups, depending on the problem. 
Matched sampling of two equated groups consists in pairing the sampling 
units of each group with respect to factors that are known or thought to be 
correlated with the experimental variable. Thus, if we wish to determine 
whether vision affects the quality of work done in a given industrial situation, 
we might have a sample of these workers examined for vision, and supply 
glasses where indicated. We could rate the work of those with glasses before 
and after their vision was corrected, and analyze any differences.* However, 
as is well known, individual output varies for many reasons and there is often 
an increase in efficiency with increased experience, for example, regardless of 
other factors. Therefore, in order to isolate the possible effect of the factor of 

* The importance of “corrected vision” in certain types of industrial work was well 
established during the past war at the Sperry Gyroscope Co. in New York. Cf. J. H. Cole¬ 
man, “The Visual Skills of Precision Instrument Assemblers,” Journal of American Psy¬ 
chology, 9 il65-170, 1945. 



320 


SAMPLES AND SAMPUNG TECHNIQUES 


corrected vision, the scientific procedure is to match two groups in relevant 
respects, use one as a control, and subject the other to the conditions of the 
experimental variable. In this case, relevant factors for matching might be 

(1) initial condition of defective vision (determined for all subjects by routine 
eye examinations), (2) length of time on the job, (3) skill or proficiency in 
work, and possibly (4) age. 

The possible effectiveness of a paired matching procedure is analogous to 
the possible effectiveness of stratified sampling. The factors chosen are controls 
to the extent that they correlate with the experimental variable. If there is 
little or no correlation, they cannot function as controls. In other words, if 
age is not correlated with efficiency on the job, it would not serve as a control 
factor. Furthermore, as in stratified sampling, (1) the principle of diminishing 
returns is likely to operate as the number of control factors is increased, and 

(2) it becomes increasingly difficult to match two groups, pair by pair, for 
more than a few factors. 

Matched samples of equated groups are dependent samples since the selec¬ 
tion of each case in one sample is dependent on a case in the other sample. 
Such samples are often established as partial rather than purely random 
samples. That is, a sample of 300 workers doing a given type of job in an 
industrial situation might be initially selected by a random method from a 
restricted universe of 3000 workers with defective vision. But in attempting 
to divide and pair the 300 subjects in respect to the four control factors 
mentioned above, a number of them would probably have to be eliminated 
because of the absence of “mates.” In fact, the difficulties of pairing for four 
factors are such that an experimenter is fortunate if he can establish two 
samples of 100 cases each, matched pair by pair. The matched groups 
would thus be partial samples of the original random sample of 300 cases. 
One group would serve as the control, and the other would be “exposed” to 
the experimental factor, “corrected vision.” 

In a well-designed investigation, the working and other conditions for both 
the control and the experimental groups would be kept as much alike as 
possible at least pair by pair, if not for each group as a whole. Only the 
experimental group, of course, would be exi)osed to the experimental factor, 
“corrected vision.” After an appropriate length of time (this is in itself a 
variable for different kinds of work and work situations), the efficiency of the 
two groups would be measured and compared. If there were a significant 
difference in performance in favor of the experimental group, this would 
indicate a difference greater than could be expected on the basis of chance. 
Nor could it be attributed to (1) the initial defective vision, (2) length of 
time on the job, (3) skill or proficiency at the beginning of the experiment, 
or (4) age, because these factors were controlled by the pairing procedure in 
matching the two groups. Therefore, the difference would be explained in 
terms of the experimental variable, “corrected vision,” and the importance of 
this factor in the particular situation would be established. 



SOME FURTHER CONSIDERATIONS ABOUT SAMPLING 321 

The function of the control group is thus to give a basis for measuring what 
the experimental group would have done had it not been subjected to the 
experimental variable. Except for chance errors, the results yielded by two 
relevantly matched groups continuously exposed to similar conditions ^ould 
be similar to each other. But if one condition is changed for one group (its 
vision is corrected) and it is therefore called the experimental group, the 
other group (the control) will serve as the yardstick on which to measure the 
behavior of the experimental group had it not been exposed to the experi¬ 
mental factor (corrected vision). 

It should be clear from the preceding that there could be only a presumption 
of the generality of the effectiveness of corrected vision in all types of working 
situations. Whether such a factor would be effective in working situations 
other than the type used in the particular experiment could be definitely 
determined only by sampling other types of universes (i.e., other types of 
working situations). 

Experimental Method with Random Samples 

We said above that the equated group technique in experimental method is 
analogous in control respects to the stratified sampling technique. There is 
also an experimental technique that is analogous to purely random sampling, 
in which the basic control is the randomization of a universe. Such samples 
are independent rather than dependent, and hence are uncorrelated. 

A satisfactory control and experimental group can be set up by drawing 
two random samples from the same universe. They could be “matched” in 
only one basic factor, which, however, is the primary control factor of random^ 
zation. Any difference between two such groups at the beginning of an experi¬ 
ment should be no greater than would be expected on the basis of chance 
itself. Hence, any difference between them at the conclusion of the experiment, 
greater than could be expected on the basis of chance, would be attributable 
to the experimental variable, provided other conditions for both groups were 
kept similar during the experiment. 

This procedure is often used to establish groups matched only in the sense 
that they are random samples from the same universe; both are then exposed 
to experimental variables. Thus, in “split-run” copy testing of two advertise¬ 
ments, in which one ad is printed in half the press run and the other ad is 
printed in the other half, both ads being identical in all respects except for 
one experimental variation—^the headline, for example—each is exposed to 
two random samples of a universe.* The universe may be the entire circula¬ 
tion of a magazine. It can be randomly divided into halves by alternating the 
two advertisements in the particular issue of the magazine. By means of a 
free-sample device “buried” in small print in a relatively less important part 

* J. Zubin and J. G. Peatman, “Testing the Pulling Power of Advertisements by the 
Split-Run Copy Method,” Journal of Applied Psychology^ 29:40-57, 1945. 



322 


SAMPLES AND SAMPUNG TECHNIQUES 


of the ad, the relative pulling power of each ad can be evaluated in terms of 
the number of replies to each one. Any difference in the number of replies, 
greater than would be expected on the basis of chance, can be attributed to 
the difference in the particular copy that was experimentally varied. The 
copy “pulling” the greatest number of replies is more effective. However, 
whether it would also be more effective for other universes (i.e., other types 
of magazines, newspapers, display advertising, etc.) could be determined 
only by sampling and testing it with these other universes. 

TERMINOLOGICAL DISTINCTIONS FOR SAMPLING 
^ AND ANALYTICAL STATISTICS 

In addition to the concepts and techniques for the sampling of statistical 
universes which have been developed in the preceding sections, there are 
several distinguishing concepts and symbols in sampling and analytical 
statistics that should be noted at this point. 

/ 

\JPiirameters and “True Measures” 

Any measure of the distribution of a statistical universe or population is 
called a parameter. Percentages, means, standard deviations, centile values, 
correlation coefficients, and similar measures derived from the data of uni¬ 
verses are parameters. 

In the case of infinite populations, parameters are purely hypothetical 
measures, but their value can at times be closely estimated from large samples 
of observations. For some finite populations, actual parameter values can be 
computed. However, it should be borne in mind that such values are always 
subject to errors that occur in the process of observation or measurement 
itself. Consequently, although by definition they are parameters, such values 
can hardly be described as “true measures.” The concept true measure implies 
an errorless parameter value. Although such measures are obviously hypo¬ 
thetical, statistical techniques have been developed that will yield an estimate 
of them, with the attenuating effect of errors of observation or measurement 
theoretically eliminated. 

Some statisticians use the terms parameter values and true measures synony¬ 
mously. However, as the two terms are defined here, a parameter value is 
not necessarily a true measure. Only if it is errorless is the parameter, by 
definition, a true measure. 

Statistics 

Any measure derived from the data of a sample is called a statistic. In 
practice, statistics are often referred to as obtained measures. However, not 
all obtained measures are necessarily statistics, for a measure may be ob¬ 
tained for an entire finite population, in which case it is a parameter rather 
than a statistic. 



TERMINOLOGY FOR SAMPUNG AND ANALYTICAL STATISTICS 323 


The measures obtained in the reduction of sample data, such as percentages, 
means, standard deviations, centiles, correlation coefficients, etc., are statistics. 
The concept statistic is used to differentiate sample measures from those of 
universes. 

Many of the techniques of sampling and analytical statistics have been 
developed in order to enable the estimation of the values of parameters from 
known values of statistics. As we have seen, it is of the essence of sampling 
and analytical statistics that universes be studied by means of the “evidence” 
of sample data. 

w^Symbols for the Differentiation of Parameters and Statistics 

Since most of the measures actually used in sampling statistics are statistics 
rather than parameters, it is usually sufficient to differentiate only the latter. 
For this purpose we shall employ the subscript u, for universe. Thus, the mean 
of a statistical universe, and hence its parameter value, will be signified by 
Mu. Similarly, parameter values of centiles, standard deviations, and similar 
measures, will be signified by Cu, (Tuy etc. The subscript h will be frequently 
used to denote a hypothetical value of a parameter. When it is necessary 
to distinguish a statistic, the subscript s will be employed. 

Some authors have drawn on the Greek alphabet for symbols for param¬ 
eters and on the English alphabet for symbols for statistics. This practice, 
if universal, would no doubt offer the easiest way of clarifying the present 
situation in statistical terminology, which is admittedly confused. However, 
many Greek symbols already have established usages in descriptive and 
sampling statistics. Thus, the Greek rho has long been used to symbolize 
Spearman’s rank-difference correlation coefficient. Its use to represent a 
universe r would be confusing, as would also the use of a to represent only the 
standard deviations of universes. 

Sometimes a sample of a variable must be differentiated from the uni¬ 
verse of the variable. Thus, if the variable under consideration consists 
of intelligence quotients and the variable itself is symbolized by I.Q., then 
a sample of the variable will be differentiated from the universe by the same 
subscript, s, as is used to distinguish a statistic. I.Q., will refer to the sample 
of the variable, and I.Q.u to the universe. If a variable in a formula is alge¬ 
braically symbolized by a: or y, a sample will be designated by x, to differen¬ 
tiate it from Xu, or Xh. Similarly, the number of cases in a sample will be 
differentiated from the number of cases in a universe by N» and Nu. 

'--"^mpling Distributions 

Just as the data of a sample are called sample data, so the distribution of a 
given statistic, such as a mean obtained from a series of samples, is called 
a sampling distribution. Sampling distributions are composed of the values 
of a given type of statistic derived from a series of random samples of uniform 



324 


SAMPLES AND SAMPLING TECHNIQUES 


size from the same universe. If we drew from a given universe 100 random 
samples, each consisting of 1000 cases, a distribution of the 100 means of these 
samples would be an empirical sampling distribution of the particular statis¬ 
tic, namely, the mean. 

A sampling distribution of a statistic is different from a distribution of 
the data of a sample. The latter is the ordinary distribution of the frequencies 
of a single sample. 

The concept sampling distribution is integral to the methods of sampling 
and analytical statistics because the study of universes from the data of 
samples is based on definite assumptions about the form of sampling distri¬ 
butions, as well as upon estimates of the extent of their variation. For 
example, many of the computations and estimates of analytical statistics 
are based on the assumption that the form of the sampling distributions of 
various statistics is similar to that of the normal probability curve. In other 
cases, especially in very small samples, sampling distributions have a differ¬ 
ent form. Thus, for a series of samples of less than 25 or 30 cases each, the form 
of the sampling distribution of any statistic derived therefrom will skew more 
and more from the normal probability curve, the smaller the size of the 
sample. Sampling distributions of product-moment correlation coefficients 
also are extremely skewed as this statistic approaches values of 1.0 or —1.0. 

\/Small Sample Theory vs. Large Sample Theory 

The concept large sample theory is used in statistical method to designate 
sampling distributions, and the techniques developed for them, that are 
sufficiently large and of a character to yield distributions approximating the 
normal probability curve. The concept small sample theoryy on the other hand, 
is used to describe the sampling distribution and related concepts derived 
for statistics of samples that are so small in size (less than 25 or 30 cases) as 
to yield sampling distributions which definitely diverge from the normal 
probability type. It should be observed, however, that not all sampling dis¬ 
tributions for statistics derived from large samples necessarily yield sampling 
distributions that are normal in form. Correlation coefficients that approach 
1.0 or —1.0, just cited, are an example. Such statistics as these are often 
derived from very large samples, but the parameters themselves are of such 
a character as to yield sampling distributions of a form other than the normal 
type. 

In practice, sampling distributions are usually hypothetical concepts. 
In other words, it frequently happens that only the data of one or two samples 
are available in a research investigation, and consequently, both the form of 
the sampling distribution and the measure of the extent of its variation have 
to be estimated. Many of the mathematical procedures in sampling and 
analytical statistics have been devised for just this purpose, on the assumption 
that the samjffing distribution has a de^te form and is based on random 
sampling or straiijiedrrandom sampling —another reason for emphasizing the 



TERMINOLOGY FOR SAMPUNG AND ANALYTICAL STATISTICS 325 


importance of random sampling in analytical statistics. In fact, no sound 
inferences or conclusions about specific universes are possible from a study 
of samples derived therefrom unless the latter have been obtained by the 
method of reuidom or stratified-random sampling. Only when a sample is 
known to have been drawn randomly and to have been derived from the 
kind of universe which yields a definite type of sampling distribution can 
inferences be made about the universe with confidence. 

Standard Error of a Statistic 

The standard error of any statistic is the standard deviation of a sampling 
distribution of that measure. In practice, it is often necessary to estimate this 
deviation from only one or two samples. For some statistics, however, relevant 
hypotheses about the universes investigated are of such a kind as to enable 
a rather precise estimate of the standard deviation of the hypothetical sam¬ 
pling distribution of a giv^ statistic. 

,^^^x^tatistical Hypotheses 

The mean of a sampling distribution is, as we shall see in Chapter 13, 
usually established by hypothesis, in which case it is a parameter value. 
For example, in an attempt to determine from a random sample of pur¬ 
chasers of X cigarettes whether the buyers are evenly divided between men 
and women, we test the statistical hypothesis of a mean proportion of .50 
for each sex group. These hypothetical proportions would be the parameter 
values of the universe studied if the purchasers of X cigarettes were in fact 
evenly divided between men and women. The research problem is to deter¬ 
mine whether a random sample yields a result that differs significantly 
(greater than that to be expected on the basis of chance) from the hypotheti¬ 
cal values of the universe sampled, viz., a proportion of .50 for each sex group. 
If the random sample of purchases, divided among men and women, does 
differ significantly from these parameter values, we can confidently reject 
the hypothesis that the universe of men and women purchasers of X cigarettes 
is evenly divided. 

In this type of test for a statistical hypothesis, we said that mean values are 
often established by hypothesis. In the preceding example, the statistical 
problem of estimating the extent of the variation in the hypothetical sampling 
distribution of mean proportions is based on equations derived from the 
hypothesis. However, in other types of problems in sampling statistics, the 
sample data themselves must be utilized in estimating the degree of this 
variation. In any event, the measure of variation used is taken in terms of 
the standard deviation, and the standard deviation of any sampling distri¬ 
bution is called the standard error of the statistic in question. 

The standard error of a measure is also symbolized by a with an appro¬ 
priate subscript. Thus, aM symbolizes the standard error of a mean; o-p of a 
proportion; ar of a correlation coefficient, etc. 



326 


SAMPLES AND SAMPUNG TECHNIQUES 

Probable Error of a Statistic 

The measurement of the variability of a sampling distribution is sometimes 
taken in terms of P.E,^ the probable error. This measure, however, is based 
on the standard error, and for sampling distributions that are distributed 
according to the bell>shaped, normal probability curve, P.E. is always equal 
to ,6745 of the standard error, i.e., about two-thirds as large. Historically, 
P.E. has been used as a measure of the variability of normal, bell-shaped 
sampling distributions because the limits of M ± 1 P.E, mark the range of 
the middle 50% of the distribution. Hence, in such a distribution a statistic 
will have an equal chance of being within or beyond these limits. 

§a^ling Error and Error of Measurement 

The concept sampling error is technically used in statistics to denote the 
difiFerence between the value of a parameter of a universe and the value of 
the statistic derived from a sample of that universe. Since truly random 
samples of a universe are affected only by chance errors of sampling, any 
measure of the sampling error is a measure of the probable effect of chance 
errors on a statistic. As previously emphasized, a measure of sampling error 
is based on the assumption that the sample itself is unbiased. 

On the other hand, errors of measurement are analogous to the physicist’s 
errors of observation. They are the errors that occur in connection with the 
procedures of observing, counting, making measurements, etc. Their effect 
on a sample result can often be estimated, and hence allowed for, if it can be 
assumed that they are randomly distributed over a series of observations. 
If these errors of measurement occur randomly, then a sampling distribution 
of them will tend to be distributed according to the normal curve of error 
(the normal, bell-shaped probability distribution). The larger the series of 
observations or measurements, the less the effect of such errors on the sample 
result and on most statistics derived from it. This is so, because if they are 
distributed randomly, they will tend to balance each other and hence cancel 
out in their net effect on the value of such a statistic as a mean or a propor¬ 
tion. But in a few cases, chance errors of measurement distort the value of a 
statistic; for example, they tend to decrease the size of a correlation coeffi¬ 
cient. However, on the assumption that such errors are randomly distributed, 
mathematical techniques are available that will permit mi estimation of the 
attenuating effect of such errors on a result; and therefore their effect can 
often be allowed for in evaluating a result. 

EXERCISES 

1. What is the difference between a census and a sample? What circumstances re¬ 
quire the use of samples rather than of censuses? 

2. Define a statistical population or universe and give several examples in research 
of (a) finite and (b) infinite umverses; of (c) actual and (d) hypothetical universes. 



TERMINOLOGY FOR SAMPUNG AND ANALYTICAL STATISTICS 327 

3. What kinds of errors necessarily are present in all sampling? Why? 

4. What kinds of errors should be avoided in sampling? Why? 

5. Define a representative sample result. Are sample results derived from a cross- 
section or stratification of a universe necessarily representative of the universe 
sampled? Why? 

6. Define a biased sample and state the kinds of factors or circumstances that make 
for bias in sampling. Give several examples of these factors. 

7. Define a random sample and a stratified-random sample. What do these two 
types of samples have essentially in common? What is the basic difference between 
them? 

8. Describe the techniques employed to insure the randomization of samples. 

9. Define a sampling unit and distinguish between initial or primary sampling units 
and the sampling unit per se. 

10. State the procedures by which a stratified-random sample of a given size can yield 
a more adequate result than a random sample of the same size. 

11. What kinds of research information can be derived from a stratified-random 
sample that ordinarily are not obtainable from a purely reuidom sample? 

12. What arc internal controls in sampling and how are they utilized to make the 
result more adequate? 

13. On what fundamental assumptions and principles is the technique of areal sam¬ 
pling based? What is the essential difference between areal sampling and master 
sampling, and under what circumstances can the latter produce adequate results? 

14. Define the random-point method of sampling and describe the difficulties inherent 
in it. 

15. Define the stratified-quota method of sampling and discuss its advantages and 
disadvantages. Is this method the same as stratified-random sampling? 

16. What is the primary control factor in all sampling techniques? 

17. Distinguish between the representativeness and the precision of a sample result, 
and define an adequate sample. 

18. Why is the character of the sample a more important consideration than its size? 

19. Distinguish between random samples and accidental or ignorant samples. 

20. In sampling, why is it necessary to study restricted universes? What is the differ¬ 
ence between the sampling of a restricted universe and a partial investigation? 

21. Describe the nature of the sampling procedures used in the experimental method 
of equated groups. What is the essential difference between dependent and inde¬ 
pendent samples? 

22. What is the technical distinction between a parameter and a statistic? In what 
sense is it misleading to describe a parauneter as a “ true measure ” ? 

23. What is the difference between a distribution of a sample and a sampling distri¬ 
bution? 

24. Under what circumstances is the standard deviation of a distribution called the 
standard error? 

25. What is the difference between sampling errors and errors of measurement? 

26. Contrast the research procedures used to control errors of measurement and 
sampling errors. 



CHAPTER 12 


Probability and Statistical Inference 


A. THE STATISTICAL CONCEPT OF PROBABILITY 

If the individuals of a large universe are distributed in the proportions of 
exactly .50 females and .50 males, the most likely result of many random 
samples of, say, 1000 persons per sample drawn from that universe will be 
50% females and 50% males. These are the probability values of the occurrence 
of two kind of events, females and males, in this particular universe. The P 
(probability) value for males (or for females) is expressed as .50, or 50 in 100, 
or as the ratio 1/2. 

We would not expect a single sample, or only a few samples, to give exactly 
50% females and 50% males, because of the operation of chance errors in 
sampling. In fact, some result other than .50 and .50 is more likely, because 
many different proportions of males and females are possible, purely on the 
basis of chance in random sampling; .50 and .50 represent but one combina¬ 
tion of these possible results. Although this particular combination is more 
likely to occur than any other one combination (say, .45 females and .55 males), 
it is not more likely than all the other possible combinations taken as a whole. 

Definition of Probability 

The probability of an event is defined as the relative frequency of that 
event in all possible events of the class or universe under consideration. 
This is the frequency theory definition of probability. Although historically 
there have been other definitions, the above is basic to the statistical utiliza¬ 
tion of the implications of probability. It is based on the assumption that 
the instances or events of a universe are indefinitely repetitive. As von Mises 
says, “We call the probability of an attribute (experimental result) in a collec¬ 
tive [universe] the limiting value of the relative frequency with which this 
attribute recurs in an indefinitely prolonged sequence of observations.” * 

In terms of the preceding example, the limiting value of the relative fre¬ 
quency of males to persons (both males and females) should be .50, or 1 in 2, 
as a sequence of random samples is indefinitely prolonged for a universe of 
people evenly divided with respect to sex. If the parameter proportions of the 
attributes of a particular universe are not known (as is ordinarily the case), 
the proportion of an attribute (males) would be empirically determined by 


* R. von Mises, Statistics^ Probability and Truth, Macmillan, New York, 1939, p. 308. 

328 





THE STATISTICAL CONCEPT OF PROBABILITY 329 


drawing a prolonged series of samples (of persons) at random from that 
universe. If in this series the proportion of males approached .50 as a limil 
(the limiting value), the P value for males would be taken as equal to .50, 
or 1/2. 

The probability, P, of an event (or attribute) is expressed algebraically as 
the following ratio: 

p. P P2:l] 

p + q Probability ratio 

where p equals the relative frequency of occurrence of the given type of event 
(or attribute) in a class of events; q equals 1 — p; and p + q is equal to 
unity, the total of all possible events in the class, ordinarily represented by 
1.00 or 100%. 

If the universe of persons referred to above is taken as a finite universe 
consisting of 100,000,000 members, and the P value for males is .50, this 
will mean that 

P = = 50,000,000 = 1 == 50 

fp+fn 50,000,000 + 50,000,000 2 ‘ 


where 50,000,000 in the numerator equals p, the frequency of males, and the 
sum of the two figures in the denominator represents the total frequencies 
for the universe. 

If the universe of persons is taken as infinite, and an indefinitely prolonged 
series of observations (samples) of that universe yields .50 as the relative 
frequency of males, 

P = —^ — = - = .50 

p + q .50 + .50 2 


where p == .50 is the relative frequency of males in all possible events or 
attributes (males plus females) in the class of events (persons) in the universe. 
And P = .50 is the limiting value of the relative frequency with which males 
recur in an indefinitely prolonged series of observations of the sex of persons 
in a theoretically infinite universe. 


A Single Event Has No P Value—^The Concept of Likelihood 

When we speak of the probability of a single event, we use the singular 
only metaphorically. From the point of view of the frequency theory of 
probability, a single instance or occurrence has no probability value. The 
probability of a single occurrence cannot be determined by the probability 
calculus. The implications of the theory of probability concern what happens 
on the average^ or in the long run. 

It might be argued that we can indicate the probability of rain on a given 
afternoon on the basis of our having observed weather conditions only a 
few hours earlier. However, if it looks like rain and we say the probabilities 
are that it will rain, we mean that rain appears to be very likely, i.e., that 
all the relevant evidence (observations, etc.) warrants the judgment or con- 



330 


PROaABILITY AND STATISTICAL INFERENCE 


viction that this event will occur, rather than that the P value of rain is 90 
in 100, or any other figure indicative of high probability. If, in tossing a coin, 
we say that the probability of its landing heads up is 1 in 2, we are using 
the probability concept to express our ignorance of how it will land. In the 
long run we can expect heads half the time, but we cannot predict what will 
happen in a single toss. The laws of chance cannot be used to compute a 
probability value for the behavior of a single occurrence or unique event; 
rather, they can be used to forecast what will happen on the average, or in 
the long run, for mass phenomena. And they may be utilized to judge what 
is likely or unlikely, on the basis of chance, for a single sample result. 

Unfortunately, the lay use of the concept probability is confused with the 
concept “ likely.” In statistical inference, we shall see that both these con¬ 
cepts are needed, that they have distinctive meanings, and that the implica¬ 
tions of each are essential to many types of problems. A single event has no 
calculable probability value, but its possible occurrence may be judged to be 
likely or unlikely on the basis of knowledge of the behavior of and antecedent 
conditions in that particular class of events of which it is a member. 

Strict Causality vs. Statistical Relations 

On the other hand, the occurrence of a single event that is strictly deter¬ 
mined by attendant or antecedent conditions is considered to be certain, 
rather than highly probable or very likely. Strict causal relations are charac¬ 
terized as If-x-then-y relationships, whereas statistical relations (and by 
definition they are not strictly determined) have a probability value for what 
is expected to happen on the average, or in the long run. Statistical relations 
may be characterized as If-x4hen-y~is4ikely relations. In statistical inference 
it is usually necessary to judge for a particular event or sample of observations 
whether, under circumstances x, y is or is not likely to occur. 

Many physical laws are expressions of If-x4hen-y relations; an example is 
the relation between the temperature at which water boils and atmospheric 
pressure. Statistical laws or relations, on the other hand, are expressions of 
If-x4hen’-y~is4ikely relations. Thus, a sample of persons may yield a propor¬ 
tion of males (y) that is likely to be representative of (not identical with) the 
parameter proportion of males, if conditions x are satisfied, viz., a large 
sample is drawn randomly from the universe under consideration. Or, in the 
correlation of bi-variates, such as height and intelligence, we may find little 
or no relationship in a sample result and therefore be warranted in concluding 
that no relationship for the universe is likely. 

The regression equations of correlation are a direct expression of If-x4hen- 
y-is4ikely relations: 

y = (/)« 

where the function is calculated in terms of the regression coefficient, fy* —• 
Only if Ty* were 1.00 would there be an If-x4hen-‘y relation, as in the relation 



BINOMIAL DISTRIBUTION AND PROBABILITY CURVE 


331 


between the diameters and areas of circles. In natural and social phenomena, 
however, the correlation of bi-variates yield If-x-then-y-is4ikely relations. 
The statistical problem is to determine the degree of the relation for a sample 
result, estimate the probable effect of chance errors on the result, and then 
judge what is or is not likely to occur in the light of all relevant information 
•about the result and of experience with the phenomena under consideration. 

B. THE BINOMIAL DISTRIBUTION AND THE NORA^AL 
PROBABILITY CURVE 

Normal Sampling Distributions 

Sampling distributions for many statistics are normally distributed. The 
normal probability curve (Fig. 12:2) describes the way in which chance 
errors of observation and measurement affect a result. Such errors are not 
mistakes; rather, they represent the effect, on the behavior or quality being 
studied, of innumerable factors that are as likely to affect a result positively 
as negatively, favorably as adversely. 

The normal curve describes the form of the distribution of many qualities 
or traits of organisms, particularly those that are tlie consequences of in¬ 
numerable determiners in genetic development and growth. It is as if the 
determiners operate to produce a combination of results that are distributed 
according to the laws of chance—some favorably, others unfavorably. Some 
combinations are rare; others are fairly frequent; still others are very frequent 
at the average. Distributions of I.Q. scores and of less generalized abilities, 
and distributions of such anthropometric traits as height and weight have 
been found in certain types of populations to resemble the normal probability 
curve, provided the populations are not too heterogeneous as regards such 
factors as race, age, and sex. 

Some insight into the relationship between the laws of chance and the 
normal probability curve can be obtained from a consideration of the point 
binomial (p + g)”, in which p represents the probability of the occurrence of 
an event in a class, q represents the probability of its non-occurrence, and n 
is the size of the sample, iV*. For illustrative purposes, we shall continue 
to use the above example of a universe in which the sexes are evenly divided. 
The probability, P, of the occurrence of males, p, in random samples will be 
equal to the ratio 1/2, or P = .50; and q, the probability of the non-occurrence 
of males, will also be 1/2, or P = .50. Because the only alternative to the 
occurrence of males is the occurrence of females, q of the binomial represents 
females. 

Equiprobability 

Since the probability of either p or g is 1/2, or .50, p and q are equiprobable. 
In other words, males are just as likely to be drawn in the samples as females, 
and vice versa. The equiprobability of events underlies the normal probability 



332 


PROBABIUTY AND STATISTICAL INFERENCE 


distribution. As was said in the preceding chapter, chance errors are errors 
that are just as likely to occur as not to occur; hence they are equiprobable. 
Tn drawing samples of persons from a universe composed of an equal number 
of males and females, the principle of equiprobability should operate provided 
truly random samples are drawn. Let us see, first, what the results will be 
if small samples of 2 persons each are drawn, i.e., Ng = 2. 

Binomial for Samples of N« =2 

When Nt = 2, the result of any random sample of persons will consist of 
one of the following three alternatives: 

2 males 

1 male and 1 female 

2 females 

These three alternatives or variations are the only results possible with the 
universe sampled, 2 people per sample. Any of them may be obtained on the 
basis of purely chance faeiors in sampling. However, the combinations them¬ 
selves are not equiprobable even though the probability of the occurrence of 
a male is 1/2 and of a female is 1/2. The three combinations are not equi¬ 
probable because one of them can occur in two different ways. In drawing a 
random sample of 2 persons, any one of the following four arrangements may 
be obtained: (1) m and m; (2) m and f; (3) f and m; (4) f and f. The combina¬ 
tion of 1 male and 1 female can thus be obtained in two different ways, whereas 
the combination of 2 males or of 2 females can be obtained in only one way. 
The different ways in which a given combination can occur are known as its 
permutations. 

When p and q are equiprobable, the probability of any particular combina¬ 
tion is therefore the ratio of the number of different ways it can occur to the 
total number of ways all the different combinations can occur. In this case 
the probability of the combination 1 m and 1 f is 2/4 or 1/2, whereas the 
probability of 2 m is 1/4, and that of 2 f is also 1/4. These values are readily 
obtained by the following expansion of the binomial: 

02 : 2 ] 

(P + ?)” ~ Cp + ?)* = P* + 4“ qf* Binomial expansion 

when n — 2 

where n, the power of the binomial, is equal to the size of the sample, TV,. 

We have assumed that the p and q events in the sample are equiprobable. 
Their respective P values are therefore 1/2. Hence, if we substitute their P 
values in the preceding equation, we have: 

i + 2(i) + | + } 

P = .25 + .50 + .25 = 1.0 

(2m) dm, If) (2f) 



BINOMIAL DISTRIBUTION AND PROBABIUTY CURVE 333 

The expansion of the binomial thus expresses the probability of any possi¬ 
ble combination of result occurring in a prolonged series of random samples. 
The number of different possible combinations is equal to n + 1. The total 
number of different ways (permutations) in which all possible results can be 
obtained, when p and q are equiprobable, is equal to 2^. And the probability 
of any particular combination is the ratio of its number of permutations to 
all possible permutations, the number of permutations for any combination 
being indicated by the coefficient of the binomial term for the particular 
combination, provided p = 1/2, and the number of all possible permutations 
being indicated by 2\ Where N,, the size of the sample, is equal to 2, the 
number of different combinations of possible results is 

(n + 1) = 2 + 1 = 3 

the total number of possible permutations, when p and q are equiprobable, is 

2« = 22 = 4 

And, as already indicated, the probability value of each of the three com¬ 
binations is as follows: 

2 males, P = .25; 1 male, 1 female, P = .50; 2 females, P = ,25 

In the long run, for an indefinitely prolonged series of random samples, 
2 persons per sample, we would expect to obtain the results shown in A in 
Fig. 12:1. This is the theoretical sampling distribution of a point binomial 
in which the p and q events are equiprobable and the size of the sample is 2. 
One-quarter, or 25%, of the sample results will consist of 2 males (or 2p); 
one-half, or 50%, will consist of 1 male and 1 female (or pq ); and one-quarter, 
or 25%, of 2 females (or 2q), 

It will be observed that the distribution for AT, = 2 and p = 1/2 in Fig. 12:1 
is uni-modal and bilaterally symmetrical. This distribution thus has two of 
the essential properties of the standard, normal probability curve. However, 
it is a discrete rather than continuous distribution, and is considerably more 
peaked than the normal probability curve. 

The mean frequency of males (or p events) for a binomial distribution is as 
follows: 

[12:3] 

_ jSjn Mean frequency of p 

events in a binomial 
distribution, (p -|- g)" 

where M/ is the mean frequency; AT, is the size of the sample; and p is, by 
knowledge or by hypothesis, the proportion (or probability value) of p events 
in the universe under consideration. In this case, p = 1/2, or .50. Hence, the 
mean frequency of males in the sampling distribution in Fig. 12:1 is 1.0: 


M/ = 2(.50) = 1.0 



334 


PROBABIUTY AND STATISTICAL INFERENCE 


The standard deviation of a distribution of frequencies of p events for a 


binomial distribution is as follows: 


<r ="vN,pq 


[12:4] 

Standard deviation of 
the frequency of p 
events in a binomial 
distribution 


where iV, is the size of the sample, p is the proportion (or probability value) 
of p events, and q is 1.0 — p. 

When TV, = 2 and p = .50, 

<T =V'2(.50)(.50) = .707 


Fig. 12:1. The Theoretical Sampling Distributions of the Binomial (p+q)”/ When p 
and q Are Equiprobable; with the Size of the Samples, Equal to 2,3,6, and 12, and 
with the Total Areas of Each Distribution the Same. (Plotted from data in Table 12:1} 


A. N,*2 



Frequency of p (moles) per Sample (fg) 


BINOMIAL DISTRIBUTION AND PROBABILITY CURVE 


335 


Probability Distributions 

The four binomial distributions in Fig. 12:1 have been drawn so that the 
effect of an increase in the size of the sample, on the form of the sampling 
distribution may be seen. The possible frequencies of p events (males), per 
sample result, are scaled on the abscissa. The ordinates are scaled in percentage 
of sample results. As indicated in Table 12:1, which contains the data for the 
distributions in the figure, the percentage of sample results for each possible 
frequency of p events (males) in random samples coincides with the P value 
(probability value) of each type of possible sample result. The distributions 
in the figure are thus probability distributions, and they are scaled so that 

Table 12:1. Theoretical Percentage Distributions and P Values of Sample 
Results for the Binomial When Males (p) and Females (q) Are Equiprobable 
and the Size of Random Samples, Ns, Is 2, 3, 6, and 12 


Frequency of Males 
per Sample 

% Frequency of 
Sample Results 

Probability 
Value IP) 

A. (When N, = 2) 



2 males 

25% 

.25 

1 male 

50% 

.50 

0 male 

25% 

.25 

Total = 

100% 

1.00 

B. (When N. = 3) 



3 males 

12.5% 

.125 

2 males 

37.5% 

.375 

1 male 

37.5% 

.375 

0 male 

12.5% 

.125 

Total = 

100.0% 

1.000 

C. (When N. = 6) 



6 males 

1.56% 

.0156 

5 males 

9.38% 

.0938 

4 males 

23.44% 

.2344 

3 males 

31.25% 

.3125 

2 males 

23.44% 

.2344 

1 male 

9.38% 

.0938 

0 male 

1.56% 

.0156 

Total = 

100.0 % 

1.000 

D. (When N« = 12) 



12 males 

0.024% 

.0002^ 

11 males 

0.29 % 

.0029 

10 males 

1.61 % 

.0161 

9 males 

5.37 % 

.0537 

8 males 

12.08 % 

.1208 

7 males 

19.34 % 

.1934 

6 males 

22.56 % 

.2256 

5 males 

19.34 % 

.1934 

4 males 

12.08 % 

.1208 

3 males 

5.37 % 

.0537 

2 males 

1.61 % 

.0161 

1 male 

0.29 % 

.0029 

0 male 

0.024% 

.00024 

Total = 

100.0 % 

1.000 
















336 


PROBABILITY AND STATISTICAL INFERENCE 


each has the same total area on the chart. The total area is taken to a base of 
100 (for per cent), or 1.0 (for P values expressed as proportions). 


The Product and Addition Theorems of Probability 


Before considering further the effect of an increase in sample size on the 
form of the binomial distribution, we should be familiar with two of the 
theorems of probability that underlie the general theory of probability. They 
are the product theorem and the addition theorem. 

The product theorem states that the probability of the joint occurrence of 
two or more independent events in a class is equal to the product of their 
respective probabilities. Thus, 

[12:5] 


P(a-6.c...n) = Pa ' Pb ‘ Pe 


Probability of the joint 
occurrence of inde¬ 
pendent events 


In the foregoing example the probability of a single p event (the occurrence 
of a male) is 1/2, or .50. Hence in random samples when N, = 2, the P 
value of the joint occurrence of 2p events (2 males) is (1/2) (1/2) = 1/4, or 
.25. This was the value obtained for 2p (males) in the expansion of (p + g)^, 
when p = 1/2 and n == 2. Similarly, the P value of zero males (2 females) is 
.25. The P value obtained for any single combination of results in the expan¬ 
sion of the binomial is, therefore, based on the assumption that the members 
of a sample result are drawn independently of each other. It makes no differ¬ 
ence whether they are drawn simultaneously or successively (as in most 
sampling), so long as they are drawn independently. This independence is 
implicit in the principle of randomization. However, the P value of any 
given combination of results is also based on the addition theorem for the 
probability of alternative events. 

The addition theorem states that the probability of two or more alternative 
(or disjunctive) events is equal to the sum of their respective probabilities. 
Two disjunctive events are mutually exclusive; i.e., they cannot occur simul¬ 
taneously in a sample result; they are “ either-or ” events. 


P (o+6+c+...+n) = P. + P6 + P. + . . . + 


[ 12 : 6 ] 
Probability of the oc¬ 
currence of disjunctive 
events 


We saw that there are two ways of obtaining the combination 1 male and 

1 female, and that the P value of this result was 1/2, instead of 1/4 as in the 
case of 2 males (or 2 females). The addition theorem can be applied to this 
result. Although the combination 1 male and 1 female can be obtained in 

2 ways when iV, = 2, viz., male and female, or female and male, the com¬ 
bination can be obtained in only one way in a single sample result. Further- 



BINOMIAL DISTRIBUTION AND PROBABILITY CURVE 


337 


more it makes no difference which permutation * actually occurs because the 
character of the sample result is the same, i.e., 1 male and 1 female, regard¬ 
less of the order in which the two component members occur. Since the P 
value of a single permutation is 1/4, [(1/2) (1/2) = 1/4], and since the permuta¬ 
tions of a given combination of results are actually alternative (or disjunctive) 
ways in which the result can be obtained, the P value of the combination is 
the sum of the probabilities of all the different permutations that can yield the 
particular combination. Thus, 

P{a+h) = i + ? = I, or .50 
This is the same as the probability of a or h. Thus, 

Pa or b == i + i = i, or .50 

These two theorems enable us to determine the probability of the occurrence 
of alternative combinations when we know the probability of each. Thus, 
when Na = 2 and the probability of p = 1/2, the probability of a sample 
result that will contain at least one p event (male) is 

P (2 males) + P (1 male) = 5 "1“ § = f» Or .75 

where 1/4 is the P value for the combination of 2 males, and 1/2 is the P 
value of the combination of 1 male and 1 female. Either of these results will 
give a sample containing 1 male, but only one of these combinations can occur 
in the same sample result; hence they are exclusive alternates. The addition 
theorem gives 3/4 or .75 as the probability value of such a result. 

From the foregoing, we see that each term of an expanded binomial repre¬ 
sents a type (or combination) of result, and that the respective P values of 
each are based on both the product and the addition theorems of probability. 

Binomial for Samples of N, = 3 

If, instead of taking random samples of 2 persons at a time, we increase the 
size of each sample to 3 cases, the binomial is expanded as follows: 

[12:7] 

(p -J- ^)3 = p3 ^p 2 q ^pq 2 _|» qi Binomial expansion 

when n = 3 

There are four possible combinations (n + 1 = 4) of males and females, as 
follows: 3 males; 2 males and 1 female; 1 male and 2 females; and 3 females. 
It will be observed that the composition of each possible combination is 
indicated by the powers of p and q in each term in the binomial. Thus, p® 
represents the combination of 3 males; p^g, the combination of 2 males and 
1 female, etc. As already indicated, the coefficients of each term give the 
number of permutations for each combination, i.e., the number of ways in 
which each combination may be obtained when p and g are equiprobable. 

* The different ways in which a given combination can occur are known as its permuta¬ 
tions. 



338 


PROBABILITY AND STATISTICAL INFERENCE 


For an indefinitely prolonged series of random samples, where p and q are 
equiprobable and TV, = 3, we would have 

(i + hy = ay + 3(i)Hj) + 3(my + ihy 

= i + 3(1) + 3(1) + i 

= 1+ I + 3 

(3 m) (2 m, If) (1 m, 2 f) (3 f) 

The probability for the first combination, 3 males, is 1/8, or .125. There are 
8 possible arrangements of results for samples of 3 cases from equiprobable, 
mutually exclusive events in a class (here, males and females), but there is 
only one way of obtaining the combination of 3 males. The probability for 
the second combination, 2 males and 1 female, is 3/8, or .375, since this 
combination can be obtained in any one of 3 ways; viz., m-m-f, f-m-m, or 
m-f-m. 

The theoretical sampling distribution for an indefinitely prolonged series 
of random samples of persons, where TV, = 3, and males and females are 
equiprobable, is shown in B in Fig. 12:1. The mean of the distribution is 

N.p = 3(i) = 1.5 

and its standard deviation is 

= = -87 


Binomials for Larger Samples 

The expansion of a binomial for n = 2 and n = 3 is relatively simple 
because (p + qy can be obtained by multiplying {p + q) by (p + g). Simi¬ 
larly, (p + qy can be obtained by multiplying the expansion of (p + g)^ 
by (p + g), as follows: 

p2 + 2/)g -h g2 

_ P + g 

p3 -f 2p2g -f pg2 

p^q -h 2pg^ -f g^ 
p3 -h 3p2g -f 3pg2 + g^ 

This process can of course be repeated for (p + g)^, for (p + qy, etc., but 
it is time-consuming and arduous for larger values of n. The computation can 
be simplified by the following general formula for the expansion of a binomial 
to any power of n: 

(p + g)** = P** + ^P^*‘“^^g + , 2^^ P^"-*>g2 + — - j j- ^ P<«-3)g3 4- . . . + gn 

[ 12 : 8 ] 
Binomial for any 
power of n 



BINOMIAL DISTRIBUTION AND PROBABILITY CURVE 


339 


Binomial for Ns = 6 


If iVs, the sample size, is taken as 6, we have the following: 



6 5 


6 5-4 




p3q3 ^ 


6-5 4 3 


+ 


6 5 4 3 
123-45 


2 6 . 
P<I^ + 


1 -2 -3 -4 
4 3 2 1 


pV 


1 2 3 4 5 6 


PY 


The last term simplifies to g® since the value for the coefficient is 1.0; p® also 
equals 1.0 because any number to the zero power is equal to 1.0. 

Before the probability values of males and females are substituted in the 
above expansion of p + g for n = 6, it should be noted that when p and g 
are equiprobable, their products for any term of the expansion will be the 
same as or p~. Thus, when the probability of p = 1/2, p® is 1/64; pY is 
1/64; pY is 1/64, etc. Hence, when p = 1/2, it is sufficient to solve first for 
p^ and then determine the values of the coefficients of each term in the for¬ 
mula. When n = 6, the coefficients are 1, 6, 15, 20, 15, 6, and 1, respectively. 
W^e thus have the following distribution of results: 

(2 + "D® = A- + ^ T + if + H + if + A + T 

(6 m) (5 m, If) (4 m. 2 0 (3 m, 3 f) (2 m, 4 f) (1 m, 5 f) (6 f) 

The first and last combination can occur in only one way, whereas the com¬ 
bination of 5 males and 1 female can occur in six different ways; the 4 males 
and 2 females, in 15 different ways; the 3 males and 3 females in 20 different 
ways, etc. All these different combinations can be expected to occur purely on 
the basis of chance in sampling. 

The theoretical sampling distribution far iV, = 6 is shown in C in Fig. 12:1. 
The mean is 3.0 and the standard deviation is 1.22. The probability ratio 
for extreme combinations has decreased considerably. Thus, when iV, = 6, 
wc would expect to obtain all males in only 1 of 64 samples. Hence, such an 
extreme result with only 1 or 2 samples is unlikely^ i.e., it would be most 
unusual in the light of experience. 


Binomial for = 12 

Let us see what the situation is for samples doubled in size, i.e., TV, == 12. 
The expansion of (p + where p = 1/2, is left as an exercise for the 
student. However, 2'^ jg 4096; hence (1/2)^^ is 1/4096. The coefficients for 
each of the 13 possible combinations of results, and hence their relative fre¬ 
quency, are as follows: * 12 m = 1; 11 m and 1 f = 12; 10 m and 2 f = 66; 
9 m and 3 f = 220; 8 m and 4 f = 495; 7 m and 5 f = 792; 6 m and 6 f = 924; 
5 m and 7 f = 792; 4 m and 8 f = 495; 3 m and 9 f = 220; 2 m and 10 f = 66; 
1 m and 11 f = 12; and 12 f = 1. 


* Cf. M. Philip, The Principles of Financial and Statistical Mathematics^ Prentice-Hall, 
New York, 1941, p. 222, for the development of the factorial formula, „Cr, for determining 
the value of any coefficients of a binomial. 




340 


PROBABIUTY AND STATISTICAL INFERENCE 


The theoretical sampling distribution for an indefinitely prolonged series 
of random samples in which the size of each sample is 12 cases is shown in D 
in Fig. 12:1. The mean frequency of males is 6.0 and the standard deviation 
is 1.73. Although D is still rather peaked, a resemblance between it and the 
normal bell-shaped distribution is suggested. Furthermore, the probabilities 
of extreme combinations of results are so small as to be exceedingly unlikely 
when only a few samples are drawn.* The probability of all males in random 
samples of 12 cases each is only 1/4096, a proportion whose P value is .0002+. 
In other words, in a prolonged series of random samples, we would expect 
such an extreme result in less than 3 out of every 10,000 samples. By any 
standards of what experience shows to be likely or unlikely, as applied to 
probable inference in sampling statistics, such a result would be most unlikely 
for only one or even several random samples. The combination of 11 males 
and 1 female, with a P value of 12/4096, or .0029+, would be expected in 
the long run in slightly less than 3 samples per 1000. This result would also 
be most unlikely for only one or a few samples. 

It should be noted that the possibility of even the most extreme result 
appearing, on the basis of chance, in a single random sample is not ruled out. 
However, it should be emphasized that such extreme results are unlikely. 
The concept of what is likely or unlikely in sampling and measurement is 
thus integral to evaluating the result of a single sample, or of only a few 
samples. In studying universes by means of an analysis of sample data, a 
distinction must be made between what is likely on the basis of chance and 
what is unlikely on the basis of chance. Distribution D in Fig. 12:1 describes 
what can happen on the basis of chance in drawing an indefinitely prolonged 
series of random samples of persons, 12 at a time, from a large universe in 
which the proportions of men and women are assumed to be equal. If the 
samples are drawn randomly, the laws of chance should operate normally 
and give the variations in results shown in the figure. But if we drew only a 
single random sample of 12 persons from a universe whose male and female 
composition is unknown, we would not consider a sample yielding 12 males 
or 11 males and 1 female a likely result for a universe assumed to be evenly 
divided with respect to the two sexes. In fact, either we would reject the 
hypothesis that the sample was randomly drawn from such a universe and 
suspect the presence of bias in the sample itself, or if the sample were ob¬ 
tained by a truly random technique we would conclude that the particular 
universe sampled contained a greater proi)ortion of males than of females. 

The Expansion of the Binomial for the Normal Probability Curve 

It may be evident from the distributions in Fig. 12:1 that if the samples 
are large enough, the expansion of (p -h g)", where p = 1/2, will yield sam- 

* The student can test this for himself by tossing 12 coins simultaneously until he ob¬ 
tains all heads or all tails on a toss. 



BINOMIAL DISTRIBUTION AND PROBABILITY CURVE 


341 


pling distributions that should increasingly approach the continuity or 
smoothness characteristic of the normal probability curve. That this is the 
case is mathematically demonstrable.* The expansion of (p + g)”, when n 
is very large, will yield a distribution that, fortunately, can be obtained more 
readily by the equation of the normal probability function: 



where the total area of the distribution is taken as equal to unity. 

The normal probability curve is a smooth, continuous distribution rather 
than the succession of discrete classes characteristic of the point binomial. 
For the binomial, however, when iV«, the size of the sample, is very large, 
the relative differences in the frequencies of results for each combination 
become very small. If 7V« is taken without limit, these differences become 
infinitely small and hence the expansion is identical with the normal proba¬ 
bility distribution. It should be observed, however, that samples do not have 
to contain many more than 25 or 30 cases to give an expanded binomial that 
for all practical purposes can be treated as if it were a perfectly smooth, con¬ 
tinuous normal probability distribution. 

The Probability of a Result Derived from the Normal 
Probability Distribution 

How are the implications of the normal probability curve to be utilized in 
determining the probability of a given sample result? We have seen that by 
means of the binomial expansion we can calculate the probability of any 
combination of males and females, equally divided in a large universe, for 
small random samples. But what if we draw random samples of 100 cases 
each? What will be the probability of obtaining at least 60 males in random 
samples of a universe in which the two sexes are equally divided? The formu¬ 
lation of the probability estimate here is somewhat different from that for 
the combinations of males and females in the point binomial. When utilizing 
the implications of the normal probability distribution in estimating a proba¬ 
bility value, we estimate a range of possibilities rather than a given value. 
In other words, because a normal distribution is continuous rather than 
discrete, we estimate the probability of obtaining at least 60 males^ or of 
60 or more males, rather than the probability of obtaining the precise com¬ 
bination of 60 males and 40 females. 

To make probability estimates on the basis of the normal probability curve, 
we need to have (1) a measure of the variability of the sampling distribution 
under consideration and (2) a differentiation of the area (or frequencies) of 
the normal probability distribution in terms of its measure of variability. 

* Cf. C. C. Peters and W. R. Van Voorhis, Statistical Procedures and Their Maihemaiieal 
Bases, McGraw-Hill, New York, 1940, pp. 279-286. 



342 


PROBABILITY AND STATISTICAL INFERENCE 


The standard measure of variability is the standard deviation and Table I, 
Appendix B, gives a differentiation of the normal probability distribution in 
terms of ir/<r, or Zx- 

The standard deviation of the sampling distribution of any parameter is 
called its sta ndard error. The standard error for the present problem is 
(Tf = Vwhere TV, is the size of the sample, p the proportion of males 
(or p events) in the universe under consideration, and g = 1 — p. In this 
case p is thus the P value for males, viz., 1/2 or .50. Where iV*, the size of the 
sample, is equal to 100, 

oy =Vl00(.50)(.50) = 5.0 

A result of 60 or more males in a random sample of 100 cases is greater 
than the mean frequency of males in the universe. It will be recalled that 
the mean frequency for random samples of a universe is equal to Mf = 7V«p, 
where TV, is the size of the sample and p is the parameter mean frequency for 
the universe under consideration. In this case, 

Mf = 100(.50) = 50 

By hypothesis^ then, this value is taken as the parameter mean frequency for 
a sampling distribution when N» = 100 and p ~ .50. The parameter mean 
frequency is located at the modal point of the normal sampling distribution, 
as indicated in Fig. 12:2. The mean of the hypothetical, normal sampling 
distribution therefore is taken to coincide in value with the parameter mean 
frequency of males in the given universe. 

A Test of Significance (T) 

A sample result of 60 or more males would lie in the area of the tail of the 
normal distribution that is 2.0 standard deviation units above 50, when 
cTf = 5.0 and M/ is 50. This is so, because (60 — 50)/5.0 = 2.0. This rela¬ 
tionship, it will be observed, is similar to that developed earlier for x/o-, or 
2 scores. Thus, 

X-Mx [ 8 : 1 ] 

z =- 

(Tx 2 score 

This z score formula is for converting the original scores of a distribution to 
positions on a scale in standard deviation units, x/tr. We are now concerned, 
however, not with the difference between a particular original score, X, and 
the mean of an obtained distribution, M*, but with the difference between 
the statistic of a sample result (in this case, /«, where /, is the frequency of 
males obtained in the sample) and the parameter frequency of a hypothetical 
universe (in this case, M/, the parameter mean frequency of males). The 
difference between the sample frequency of males and the parameter mean 
frequency is 

~ M/ = 60 - 50 = 10 



BINOMIAL DISTRIBUTION AND PROBABILITY CURVE 


343 


Since the parameter value of a measure is located at the mean of its sampling 
distribution, we shall employ the symbol fh instead of M/, the subscript h 
standing for hypothesis. The symbol fh therefore complements the symbol /,. 
The latter indicates the frequency of a class of events in the sample result 
(in this case, males), sjidfh indicates the parameter frequency of that class-of 
events for the universe under consideration (in this case, a proportion of 
males equal to .50, and therefore a parameter frequency of 50 when /V, = 100). 
The difference between the sample frequency and the frequency by hypothesis 
will therefore be symbolized as /« — /a. 

In Formula 8:1, for z scores, the denominator represents the standard 
deviation of the obtained distribution of scores. In the present situation, 
however, <7/ represents the standard deviation of the sampling distribution 
when fh = 50 and Ns = 100. This standard deviation is called the standard 
error of the statistic, i.e., the standard error of the frequency of males in 
random samples for the universe of the hypothesis. It measures the variability 
of the distribution of the results of random samples, and hence serves as the 
yardstick for measuring the variation in sample results to be expected upon 
the basis of chance errors in sampling and measurement. 

We shall symbolize this new relationship by T, where T stands for the test 
ratio of a Test of Significance when Ns is large, and the normal probability 
distribution of large sample theory describes the form of the sampling dis¬ 
tribution. 


p — (sample measure) ~ (parameter measure) 
standard error of the measure 


[12:93 

General form of a Test 
of Significance 


In this case, where the measure under consideration is the frequency of a 
class of events. 


T = 


_ fh 


(T/ 


[ 12 : 10 ] 
Test of Significance for 
frequencies 


and therefore, when Ns = 100,/, = 60,/a = 50, and a/ = 5.0: 


60 - 50 
5.0 


= 2.0 


If we now consult Table I, Appendix B, for the differentiation of the normal 
probability curve in terms of x/a, we sec that when x/a = 2.0, 47.72% of 
the total area lies between the mean and a point two standard deviation units 
above it. Hence the proportion of the area above x/a = 2.0 is 50% — 47.72% 
= 2.28%. This is shown in Fig. 12:2. 

This value of 2.28, expressed as a proportion equal to .0228, is the P value 
that we set out to obtain. It is the probability value for the given T ratio 
of 2.0. It gives for a normal sampling distribution the relative frequency with 
which random samples drawn 100 at a time will yield results of 60 or more 
moles, when the universe sampled is by hypothesis, or by knowledge, divided 



344 


PROBABILITY AND STATISTICAL INFERENCE 


equally between males and females. On the basis of chance, which operates 
in all sampling, but which operates in a lawful way in random sampling, we 
would expect to obtain 60 or more males in approximately 23 random samples 
per thousand samples (or 2.3 per hundred), when TV, = 100. 

To obviate computing a probability value for any value of T by the process 
of subtraction just used, we have set up a special table (Table II, Appendix B) 
which gives these values for T from .00 to 3.0. Thus the P value for a T ratio 
of 2.0 can be read directly from the table as equal to .0228. 

The Evaluation of the Test of Significance 

Would a P value of .023 for a single sample result be considered as indica¬ 
tive of a result that is likely to occur on the basis of chance alone? A cate¬ 
gorical answer of yes or no cannot be given to this question when T = 2.0 
and hence P = .023. Generally, however, in psychological and social science 
statistics a result with a P value of .023 is judged to be either (1) likely on 
the basis of chance, or (2) doubtful because experience has indicated that such 
a P value is not definitely indicative of results that “ just don’t happen ” in 
the ordinary course of events. If the result is considered doublfulf we cannot 
be confident that it is either likely or unlikely on the basis of chance; therefore, 
additional sampling evidence will be needed. If the result is considered 
likely, this means that 60 or more males in a random sample of 100 persons 
is judged to be a reasonable expectancy for a universe equally divided between 
males and females. Such a conclusion would in effect say that the difference 
between the sample result of 60 males and the parameter mean frequency 
of 50 males is not a significant difference; rather, it is a difference likely on the 
basis of chance errors of sampling. 

Let us consider some of the implications of the preceding with respect to 
polling the preferences of a large universe of voters for two political candi¬ 
dates. The parameter frequency of the voters’ preferences for either Candi¬ 
date A or Candidate B is of course unknown (if the parameter frequency 
were known, no poll would be necessary). If A receives 60 of the preferences 
in a random sample of 100 voters of the universe polled, we should either 
(1) doubt that the preferences of all the voters in the universe would give a 
majority for A, or (2) conclude that the sample result is too likely to indicate 
an even division of voters’ preferences in the universe to warrant the inference 
that A will win. This would be the case because an even division of voters’ 
preferences requires a Test of Significance in which the parameter frequency 
of preferences for A is taken as equal to 50%. But if a sample frequency of 
60% is a likely result for the hyi)othesis that the preference for A is equal to 
50%, we cannot be confident that in the election A will get more than 50% of 
the votes. 

If, however, there were three candidates in the race, a plurality were suffi¬ 
cient to elect, and the poll results for B and C were evenly divided, 20 and 20, 



BINOMIAL DISTRIBUTION AND PROBABILITY CURVE 


345 


the conclusion that A’s election is likely would be warranted, since A would 
not need as much as 50% of the total vote in order to win, and a sample 
result of 60% for A would most likely indicate a plurality over either B or C. 

The Distribution of Frequencies in the Normal Probability 

Distribution 

In order to show the form of the normal probability distribution when 
drawn on a scale comparable to that used for the small-sample sampling 
distributions of the expanded binomials in Fig. 12:1, the total area of the 
sampling distribution in Fig. 12:2 has been made approximately the same. 

Fig. 12:2. The Theoretical Sampling Distribution of the Binomial (p + q)^^/ for 
Ns = 100, When p and q Are Equiprobable and the Expansion of the Binomial Is 
Based on the Normal Probability Function of Large Sample Theory, with the Total 
Area of the Distribution Taken as 100% and Scaled the Same as the Distributions 
in Fig. 12:1. 



The distribution of the data in this figure is given in Table 12:2, and is based 
on the normal probability function (Formula 8:2). The proportion of the area 
of the normal probability distribution for any distance on the abscissa above 
or below the mean is given in terms of x/a in Table 1, Appendix B, 

The sampling distribution in Fig. 12:2 is for random samples drawn 100 at 
a time; thus. Ns = 100. If the distribution were set up from the binomial, 
instead of on the basis of the normal probability function, the expansion 
would be equal to (p + with p (males) and q (females) equiprobable in 
the universe sampled. Under these circumstances, as we have already indi¬ 
cated, the parameter mean frequency of males is 50, and the standard error 
of the sampling distribution, <r/, is 5.0. Hence, between the mean of 50 males 
and a point one standard deviation above it, there will be an increase of 
5 males per sample. In other words, at T == 1.0, /« = 55 males; at T = 2.0, 
fs = 60 males; at T = —1.0, /, = 45 males, etc. An increase or decrease of 
1 male in a sample is therefore equal to a change of 0.2 standard deviation 
unit on the abscissa scale. Thus, at T = 0.2, /, = 51; at T* = 0,4, f, = 52, 
etc. Consequently, in the distribution in Fig. 12:2, the successive increases 



346 


PROBABILITY AND STATISTICAL INFERENCE 


Table 12t2. Theoretical Percentage Distributions and P Values of Sample 
Results for Binomial Based on the Normal Probability Function, When Males (p) 
and Females (q) Are Equiprobable and the Size of Random Samples, Ng, Is 100 


Frequency 
of Males 
per Sample 

Class Interval in 
Term Limits of 0" 

% Frequency of 
Sample Results 

Probability 

Values 

P 

65 to 100 

3.0 to 

00 

(0.135) 

(.00135) 

64 

2.8 to 

3.0- 

0.125 

.00125 

63 

2.6 to 

2.8“ 

0.21 

.0021 

62 

2.4 to 

2.6“ 

0.35 

.0035 

61 

2.2 to 

2.4- 

0.57 

.0057 

60 

2.0 to 

2.2“ 

0.89 

.0089 

59 

1.8 to 

2.0- 

1.31 

.0131 

58 

1.6 to 

1.8- 

1.89 

.0189 

57 

1.4 to 

1.6- 

2.60 

.0260 

56 

1.2 to 

1.4- 

3.43 

.0343 

55 

1.0 to 

1.2- 

4.36 

.0436 

54 

0.8 to 

1.0- 

5.32 

.0532 

53 

0.6 to 

0.8- 

6.23 

.0623 

52 

0.4 to 

0.6- 

7.04 

.0704 

51 

0.2 to 

0.4- 

7.61 

.0761 

50 

0.0 to 

0.2- 

7.93 

.0793 

49 

-0.2 to 

0.0- 

7.93 

.0793 

48 

-0.4 to 

-0.2- 

7.61 

.0761 

47 

—0.6 to 

-0.4- 

7.04 

.0704 

46 

—0.8 to 

-0.6- 

6.23 

.0623 

45 

— 1.0 to 

-0.8- 

5.32 

.0532 

44 

-1.2 to 

-1.0- 

4.36 

.0436 

43 

-1.4 to 

-1.2- 

3.43 

.0343 

42 

-1.6 to 

-1.4- 

2.60 

.0260 

41 

-1.8 to 

-1.6- 

1.89 

.0189 

40 

-2.0 to 

-1.8- 

1.31 

.0131 

39 

-2.2 to 

-2.0- 

0.89 

.0089 

38 

-2,4 to 

-2.2- 

0.57 

.0057 

37 

-2.6 to 

-2.4- 

0.35 

.0035 

36 

-2.8 to 

-2.6- 1 

0.21 

.0021 

35 

— 3.0 to 

-2.8- 

0.125 

.00125 

34 to zero 

— 3.0 to 

— 00 

(0.135) 

(.00^35) 



Total 

100% 

1.00 


in frequencies of males per sample are scaled into class intervals equal to 
0.2cr. 

The percentage of the total area of the normal probability distribution 
between the mean and any distance above or below it, in terms of x/a- (or T), 
can readily be obtained by referring to Table I, Appendix B. Thus, when 
x/a- =s 0.2, 7.93% of the total area is found to lie between the mean and 0.2(r. 
Since the distribution is bilaterally symmetrical, the same percentage of the 
area lies between M and — 0.2<r. The percentages of the total area for suc¬ 
cessive intervals, taken 0.2<r at a time and given in Table 12:2, are not differ- 




















SAAALL SAMPLE THEORY—LEPTOKURTIC SAMPLING'^DISTRIBUTIONS 347 


entiated beyond ±3.0<7. Sample frequencies of less than 35 or more than 
65 males are not indicated in Fig. 12:2. The reason for stopping at ±3.0 is 
obvious: the percentage of possible sample results beyond these points is so 
small that they cannot be differentiated on the graph. The likelihood of such 
extreme results from random samples of the universe under consideration is 
remote. Of the total area of the normal probability curve, 99.73% lies between 
M ± 3.0(7. Results beyond M ± 3.0<7, for an indefinitely prolonged series of 
random samples, will occur less than 3 times in 1000, since 100.% — 99.73% 
= 0.27%. Half of this remainder, or 0.135%, is the percentage of the area 
beyond 3.0(7, and consequently P = .00135 is the probability value for 65 or 
more males in random samples when Ns = 100. Similarly P = .00135 is the 
probability value for less than 34 males in such samples. 

From our earlier discussion of what is likely or unlikely on the basis of chance 
in one or only a few random samples of a defined universe, it should be evi¬ 
dent that in the above situation samples consisting of 2/3 or more males, 
or of 1/3 or less males, are extremely unlikely. If such an extreme result as 
70 males were obtained in a random sample of a universe whose parameter 
mean frequency was unknown but was taken at 50 by hypothesis for a Test 
of Significance, we would with confidence reject the hypothesis and conclude 
that the sample of 70 was drawn from a universe whose parameter mean 
frequency was greater than 50. 

C. SAAALL SAAAPLE THEORY—LEPTOKURTIC SAMPLING 
DISTRIBUTIONS 

Not all sampling distributions take the form of the bell-shaped normal 
probability function. That tliis is the case for small samples was shown in 
Fig. 12:1. Although these sampling distributions are uni-modal and bilaterally 
symmetrical, their form obviously would not correspond to that of the normal 
probability curve even if they were transformed from histograms into smooth 
continuous curves running through ordinate points at the mid-points of each 
interval. The normal probability function, differentiated for x/a in Table I, 
Appendix B, is thus not satisfactory for describing how the areas of small- 
sample sampling distributions are distributed. Hence, probability estimates 
for results obtained from random small samples are based on different tables 
of probability values from those used with large sample results whose statis¬ 
tics are normally distributed. 

It should be emphasized that the bell-shaped probability curve, based on 
Formula 12:5, is not the only “normal” sampling distribution. On the con¬ 
trary, there is a sampling distribution “normal” for every given size of 
sample and given kind of statistic. Thus, the sampling distributions in 
Fig. 12:1 are the “normal” probability distributions for the point binomial 
when Ns = 2, 3, 6, and 12. Unless this fact is recognized, the concept “normal 
probability distribution” is likely to be ambiguous. What is normal for one 



346 


PftOBABIUTV AND STATISTICAL INKRENCE 


situation may be non-normal for another. Generally, however, the normal 
probability distribution is employed to refer to the standard bell-shaped fre¬ 
quency curve of large sample theory, based on Formula 8:2. When a sampling 
distribution takes a different form, the difference should be reported. 

Kurtosis (Ku) 

As samples decrease in size, their resp'ictive sampling distributions are 
much more peaked (Fig. 12:1) than the normal probability distribution of 
large sample theory (Fig. 12:2). The technical term in statistics for differences 
in the distribution of the area (or frequencies) about the mean of a uni-modal, 
bilaterally symmetrical curve is kurtosis (from the Greek, meaning “over¬ 
arching”). The normal probability curve of large sample theory is mesokurtic 
(meso from the Greek mesos, meaning “middle”), whereas the peaked distribu¬ 
tions in Figs. 12:1-12:4 are leptokurtic (lepto from the Greek leptos, meaning 
“slender”). Occasionally distributions are considerably flattened throughout 
the middle; these are described as platykurtic (platy from the Greek platos, 
meaning “flat”). Tests of Significance for analyzing the kurtosis of a dis¬ 
tribution are developed in Chapter 13, Section D. Such tests make it possible 
to judge whether the divergence of a given distribution from mesokurtosis is 
significant. 

The sampling distributions of most statistics in large sample theory can be 
assumed to be similar in form to the normal probability curve (Fig. 12:2), 
provided the universe sampled is itself large. Probability estimates can 
therefore be made for such statistics from a differentiation of the one curve, 
as in Table II, Appendix B. In small, sample theory, however, the form of 
the sampling distributions changes somewhat as TV, is increased or decreased 
by only one case. Hence a differentiation of the area of one curve for small 
samples of differing sizes is not adequate for probability estimates, and a 
different table of probability values is therefore required for different values 
of TV.. 


f Statistic 

A satisfactory treatment of the varying forms of the distributions of small 
samples was first presented by William S. Gosset, an English scholar, in an 
article published under the pseudonym “Student” in 1908.* A table of prob¬ 
ability values for small samples developed by R. A. Fisher from Student’s 
distribution is presented in Table III, Appendix B. As will be indicated later, 
this table is employed in Tests of Significance for random samples when TV. 
is less than 25 or 30. When such tests are made for samples as small as these, 
t instead of T is used to symbolize the difference in the sampling situation. 

This distinction between t and T is the basis for understanding the implica- 

* Student, “The Probable Error of a Mean,” Biomeiriha^ 6:1-25,1908. 



SKEWED SAMPLING DISTRIBUTIONS AND NORAAAL PROBABILITY 349 


tions of Fisher’s i statistic of small sample theory. The symbol concept t 
signifies the test ratio of a Test of Significance of a statistic derived from a 
small sample of observations or measurements; in large sample theory, this 
ratio is represented by T. 

When Is a Sample Small? 

There is no sharp line of division between “small samples” of small sample 
theory and “large samples” of large sample theory. However, inspection of 
Table 111, Appendix B, reveals that the probability values of t for sampling 
distributions based on 25 to 30 cases are practically identical with those based 
on infinitely large samples. Consequently a value of from 25 to 30 for N» is 
generally taken as the basis for distinguishing between the samples of small 
sample theory and those of large sample theory. That this does not harmonize 
with the use of such terms as small and large samples in other connections may 
be apparent. A sample of 100 cases, for example, would be a relatively small 
sample of all the voters in the United States; a moderately large sample of 
such a universe would consist of several thousand cases. The situation here is 
of course different from that in small sample theory and large sample theory. 
Small sample theory indicates that the form of the sampling distribution is 
increasingly leptokurtic when the size of the sample is taken as less than 25 or 
30 cases. If a sample of 100 cases is relatively small for a particular investiga¬ 
tion, the implications of large sample theory, rather than small sample theory, 
are nevertheless used in evaluating the result. 

D. SKEWED SAMPLING DISTRIBUTIONS AND NORMAL 
PROBABILITY 

The standard, normal probability distribution in Fig. 12:2 is more general 
than the binomial distribution. This is the case because an asymmetrical 
binomial distribution, and some other types of non-normal distributions, 
approach the symmetrical, normal probability curve when large random 
samples are drawn from a large universe. On the other hand, the sampling 
distributions of some statistics are skewed, regardless of whether the samples 
are small or large, or drawn randomly from small or large universes. This is 
true, for example, of the sampling distributions of correlation coeflBcients 
as their parameter values approach 1.00 or —1.00. High values of r yield 
sampling distributions that are increasingly skewed and leptokurtic. However, 
as the parameter values of r approach zero, the sampling distributions of 
correlation coefficients increasingly approach the standard, normal proba¬ 
bility form. The sampling distribution for r* = zero is normal. This latter 
point is of considerable importance because, as we shall see later, one of the 
primary Tests of Significance for correlation coefficients is for the hypothesis 
that rh is zero (the subscript h symbolizing the parameter value of the hypothe¬ 
sis tested). 



350 


PROBABILITY AND STATISTICAL INFERENCE 


The Binomial When p 9 ^ q 

The basis in sampling for skewed distributions can readily be seen from 
the expansion of the binomial when p and q are not equiprobable, i.e., p 9 ^ q- 

Fig. 12:3. The Theoretical Sampling Distributions of the Binomial (p + q)^ 
When p 9^ q (with p = and q = with the Size of the Samples, Ng, Equal to 
2, 6, and 16, and with the Total Areas of Each Distribution the Same (100%) 

60 - -60 


A Ng=7 



Frequency of p (moles) per Sample (fj) 

Furthermore, the greater the difference in the values of p and g, the greater 
the amount of skewness for sampling distributions based on a given size of 
sample. It may appear paradoxical that if samples of large imiverses are suffi¬ 
ciently large, the skewness of sampling distributions when p 9 ^ q becomes 
negligible and the results may for all practical purposes be treated as if the 
distribution were of the standard, normal form. However, this can be demon¬ 
strated mathematically; * it is graphically indicated by the distributions in 
Fig. 12:3. 

* Cf. J. G. Smith and A. J. Duncan, Sampling Slaiisiics and Applications, McGraw-Hill, 
New York, 1945, especially chap. 4. 



SKEWED SAMPLING DISTRIBUTIONS AND NORAAAL PROBABILITY 351 


If, instead of drawing random samples from a large universe equally divided 
between males and females, we draw them from a universe in which males 
outnumber females by 3 to 1, the probability of males (p) will be 3/4, and of 
females {q), 1/4. Let us see what happens in the expansion of the binomial 
for small samples when Na = 2, TV, = 6, and Ns — 16. The theoretical 
sampling distribution will be as follows, when = 2: 


(P + 9)” = (f + I)' = A + + iV 

P - .5625 + .3750 + .0625 = 1.0 

(2 m) (1 m, If) (2 0 

The respective probability values of these three results range from P = .56 
to .06, and the theoretical sampling distribution (“normal” for this situation 
in sampling) is graphed in A in Fig. 12:3. It is obvious that the distribution is 
asymmetrical, with the skewed portions (or extended tail) in the direction 
of fewer males in the sample results. As a matter of fact, the distribution has 
no central tendency, and its mode is at one end of the distribution. The param¬ 
eter mean frequency is equal to Nsp, as for the binomial when p = 1/2; 
liowever, p is now equal to 3/4. Hence M/ — 2(3/4) = 1.5 males. This is 
obviously not the value of the modal interval (or point). The mode (Mo) for 
a binomial is equal to the following: 

[ 12 : 11 ] 

Mo = the integer value between /V,p — q and iV,p + p Mode of a binomial 

distribution 

Thus, in the preceding example, 

Nsp-q = 2(f) - i = 1.25 

and 

AT.p -f 9 = 2(1) + f = 2.25 


hence Mo = 2.0, since this is the integer value between 1.25 and 2.25. 
When Ns = 6, the expansion of (p + g), for p = 3/4, is as follows: 



6^5^/ay/i; 

^ 1 2 3 4X4/ U 



/iv 6 • 5 • 4 /sy /ly 

W ^l-2-3W W 
6 • 5 • 4 • 3 • 2 Aiyiy 
1 2 3-4 5 wU/ ^ 




P = .1780 + .3560 + .2966 + .1318 + .0330 + .0044 + .0002 = 1.0 

(6 m) (6 m, If) (4 m, 2 f) (3 m, 3 f) (2 m, 4 f) (1 m, 5 f) (0 0 

The probability of each combination ranging from all males to no males per 
sample varies from P = .18 to P = .0002. Graph B in Fig. 12:3 shows the 
form of the sampling distribution. The distribution is still skewed, but not 



352 PROBABILITY AND STATISTICAL INFERENCE 

so much so as in y4, when iV, = 2. The parameter mean frequency of males 
is 

N,p = 6(f) = 4.5 

and the parameter modal frequency of males is equal to the following, 
obtained by means of Formula 12:11: 

= 6(f) - f = 4.25 
iV.p + p = 6(f)+ f = 5.25 
Hence, Mo = 5.0 

Thus, the modal frequency of males has shifted one interval toward the 
center of the distribution, and although the difference between the mean 
and mode is still 0.5, relative to the entire dispersion or spread of the respec¬ 
tive sampling distributions for = 2 and iV, = 6, the difference is not so 
great as in the smaller sample. 

Graph C in Fig. 12:3 shows the form of the theoretical sampling distribu¬ 
tion when iV, = 16 and p = 3/4. The mean is 

N.p = 16(f) = 12.0 

and the parameter modal frequency is: 

N.p-q = 16(f) - f = 11.75 
iV.p + P = 16(f) + f = 12.75 
Hence, Mo = 12.0 

In the point binomial for p or q = 3/4, the mean and mode have the same 
integer value when AT, is a multiple of 4. But for a continuous distribution 
that is negatively skewed, the mean and mode are not identical in value but 
lie in the same integral interval, with the modal value of males slightly 
larger than the mean value. 

When Nt = 16 and p = 3/4, the figure shows that the mean frequency of 
the sampling distribution is shifted four intervals from the extreme of 
16 males. Furthermore, approximately 99% of the results of random samples 
will be expected to lie within the limits of the mean interval (12 m) and four 
intervals above and four below, i.e., between 8 males and 16 males. Although 
the sampling distribution is negatively skewed, less than 1 sample in 100 
may be expected to contain less than 8 males. The skewness is thus con¬ 
siderably reduced when N, is as large as 16. The P values for p events (or 


males) are as follows: 

16 m, P = .0100 

10 m, P = .1110 

4 m. P = .00003 

15 m, P = .0535 

9 m, P = .0524 

3 m, P = .000004 

14 m, P = .1336 

8 m, P = .0197 

2 m, P = .0000003 

13 m, P = .2079 

7 m, P = .0058 

1 m, P = .00000001 

12 m, P = .2252 

6 m, P = .0014 

0 m, P = .0000000002 

11 m, P = .1802 

5 m, P = .0003 



The P values for the last six intervals at the lower end of the distribution are 
too small to be shown on Graph C. 



PRECISION OF RESULTS AND SAMPLE SIZE 


353 


As in the case of Fig. 12:1, where p = 1/2, the sampling distributions in 
Fig. 12:3 are leptokurtic, but they become less so as AT, is increased. 

If N» = 100, the standard normal probability distribution is adequate for 
describing the form of the sampling distribution for (p + when p = 3/4 
and ^ = 1/4. In other words, the skewness characteristic of such a distribu¬ 
tion is negligible, and the form of the main part of the distribution (within 
the limits of M ± 3.0<r) is almost bilaterally symmetrical and mesokurtic. 
Thus, the normal distribution in Fig. 12:2 may for all practical purposes be 
taken as the form of the sampling distribution when Ns is large and p q. 
It should be emphasized, however, that this treatment for sampling distribu¬ 
tions of p g is warranted on the condition that Ns is large and also that 
p is not much greater than .95 or much less than .05. In other words, if the 
difference between p and q is in excess of .95 — .05 = .90 the samples must be 
unusually large if the skewness of the sampling distribution is to be negligible. 

E. THE PRECISION (RELIABILITY) OF SAMPLE RESULTS 
AND THE SIZE OF SAMPLES 

The adequacy of a sample result, as we saw in the preceding chapter, is 
dependent upon both (1) its character and (2) its precision. The character of 
a sample result is of prime importance. Unless we know the nature of a sample, 
we cannot with confidence study a given universe in the light of the sample 
result. From the point of view of applying the logic of probability and statis¬ 
tical inference to a sample result, we must draw random or stratified-random 
samples of the universe to be studied. Samples of this character are adequate 
provided they are sufficiently large to yield the precision needed for the 
particular investigation. 

Precision Measured by the Standard Error 

The precision of any statistic derived from a sample result is measured in 
terms of the standard error of the statistic, i.e., the standard deviation of the 
sampling distribution of the statistic. In the case of freq uencies, we saw that 
the standard error of a frequency, <7/, is equal to VNspq, where 7V« is the size 
of the sample, p is the proportion of the events of a class in the universe under 
consideration, and q is equal to 1.0 — p. As the size of random samples 
increases, the precision of the result also increases. Figs. 12:1-12:3 suggest, 
however, that the variability of the sampling distributions increases as Ns 
increases. That is, the abscissa scales of these various distributions are wider 
for the larger size of samples. If the variability of the sampling distributions 
of a statistic actually increases as the size of the sample increases, it does not 
follow that the precision of a statistic increases as Ns increases, because the 
precision is measured directly in terms of the variability of the sampling 
distribution, viz., its standard error. 



354 


PROBABILITY AND STATISTICAL INFERENCE 


Precision Generally a Function of VNs 

The contradiction suggested by Figs. 12:1-12:3 is only apparent; it is not 
real. Actually the variability of these sampling distributions will be seen to 
decrease when their respective variabilities are considered relative to their 
respective scales of measures (i.e., frequencies of males per sample result). 
Thus the standard error of the sampling distribution for iV, = 2 (Graph A, 
Fig. 12:1) is .707 males; for TV, = 12 (Graph Z), in that figure), it is 1.41 males. 
However, the frequency of males per sample result, when TV, = 2, can vary 
only from zero to 2.0, whereas it can vary from zero to 12.0 when AT, = 12. 
Therefore, relative to the size of the samples, and hence to the different possi¬ 
ble results per sample for these sampling distributions, a <7/ of .707 males, 
when TV, = 2, indicates a greater margin of possible error than a a/ of 1.4 males, 
when TV, = 12. This relationship can be seen more readily if the variability 
of sampling distributions of frequencies is measured in terms of the s ize of 
the sample, TV,. For this, the ratio of a/ to TV,, where a/ =VNspq, is as 
follows: 



Since //TV, is the proportion, p, of the class of events under consideration (in 
this case males), the standard error is symbolized by (Tp\ 

_ _ [ 12 : 12 ] 

(Tp —^pq/Ns Standard error of a 

proportion 

When TV, = 2, and the parameter proportion of p events (males) for the 
universe under consideration is .50, q will equal 


and 


1.0 - .50 = ,50 


(Tp =V(.50)(.50)/2 = .354 


If the size of the sample is doubled, 

(Tp =V(.50)(.50)/4 = .250 

(Cf. Table 12:3.) Thus, the standard error of the parameter proportion, and 
hence the variability of the sampling distribution, is reduced. Stated other¬ 
wise, the precision of the result is increased. However, the precision is not 
doubled; i.e., the standard error is not reduced to half its size when the size 
of the sample is doubled. In order to reduce the value of cTp = .354 by half, 
the size of the sample must be quadrupled. If = 2 is quadrupled, TV, will 

equal 8 and _ 

(Tp =V(.50)(.50)/8 = .177 

which is half the value of .354, the standard error for TV, = 2. 


The precision of a sample result thus appears to be directly proportional 
to the square root of the size of the sample. Or, if this relationship is stated in 



PRECISION OF RESULTS AND SAMPLE SIZE 355 

terms of error (the opposite of preciswn)^ the standard error of a result is 
inversely proportional to the square root of the size of the sample. Whether 
this relationship holds strictly depends upon the probability implications of 
the measure of error itself (in this case, the standard error of a proportion). 
If the standard error of a sampling distribution of proportions, taken with 
respect to the mean of the sampling distribution, marks off the same per¬ 
centage of probabilities (or fraction of the total area), regardless of the size 
of Nsy then this relationship between the precision of a result and the size of 
the sample will hold generally. We have seen, however, that when iV, is 
taken as less than 25 or 30 cases, sampling distributions do not have the 
standard, normal probability form. As Ns approaches 2, they become increas¬ 
ingly leptokurtic; and when p is not equal to q, they are skewed. Thus 
the percentage of probabilities between the mean ±1 standard error for 
small sample distributions is less than it would be for large sample theory. 
In the latter case, as indicated in Table I, Appendix B, M ± lor includes 
.6826 of the total area (or about 2/3 of the whole); and within the limits of 
M ± 3<r, .99730 of the total area (or nearly 100%) is included. 

For small sample theory these probability values are less. This is indicated 
in Table 12:3 for the sampling distributions of small sample theory, when 
developed as continuous rather than as discrete distributions of the binomials 
in Figs. 12:1 and 12:3. When Ns = 2, for example, the limits of M ± Icr 
include an area equal to only 50% of the total distribution; and less than 80% 
of the total area is included within the limits of M ± 3(7. 

There is thus a lawful relationsliip between the precision of a statistic and 
the size of the sample from which it is derived. The function describing this 
relation is the same for most statistics of large sample theory: Precision is 
directly proportional to the square root of the size of the sample. As indicated 
in Table 12:3, in order to double the precision of a result when Ns = 25 and 
(Tp = .100, Ns must be quadrupled; when Ns = 100, (Tp = .05. Or, if the 
parameter proportion is .75 instead of .50, (Tp = .087 when 7V« = 25, and is 
half this size, viz., .043, when Ns — 100. On the other hand, when Ns is 
much less than 25, this relationship does not hold precisely. As the table 
shows, the greatest differences in the probability implications of small samples 
are for small ones, where Ns = 2, 3, 4, etc. The Fisher-Student t statistic has 
been developed to provide a basis for probability estimates in the latter case. 
The research worker should be acquainted with its meaning and its usefulness 
in evaluating the results of small samples. Generally, however, the size of 
samples for research investigations in psychology and related fields is taken 
as at least 25 or 30 cases; consequently the sampling distributions of large 
sample theory are ordinarily those to be used. 


Precision and Reliability 

The concept of reliability has been more widely used in sampling and 
analytical statistics than that of precision. Actually, the two terms are synony- 



356 


PROBABIUTY AND STATISTICAL INFERENCE 


Table 12:3. Precision and Size of Sample 

The Variability of the Sampling Distributions of Proportions as a Function of (1) the 
Size of the Sample and (2) the Parameter Value of the Mean Proportion * 


(1) 

Size of Sample 

N. 

(2) (3) 

Standard Error of a Proportion 

(4) 

Frobobl/ff/es 

Proportion of 
Sample Results 
to Be Expected 
Within Limits 
of M + or 

““ 1 (Tp 

(5) 

Probabilities 

Proportion of 
Sample Results 
to Be Expected 
Within Limits 
of M 4- or 
— 3(7p 

(6) 

Probabilities 

Proportion of 
Sample Results 
to Be Expected 
Within Limits 
of A4 “h and 
— Sa-p 

When pk ^ .50 

When ph = .75 

Small Sample Theory f 






2 

.354 


.2500 

.3976 

.7952 

3 

.289 


.2887 

.4523 

.9046 

4 

.250 


.3045 

.4712 

.9424 

5 

.224 


.3130 

.4800 

.9600 

6 

.204 


.3184 

.4850 

.9700 

8 

.177 


.3247 

.4900 

.9800 

10 

.158 


.3283 

.4925 

.9850 

12 

.144 


.3306 

.4940 

.9880 

16 

.125 


.3334 

.4955 

.9910 

20 

.112 


.3351 

.4963 

.9926 

Large Somple Theory 






25 

.100 

.087 

.3413 

.49865 

.9973 

50 1 

.071 

.061 

.3413 

.49865 

.9973 

75 

.058 

.050 

.3413 

.49865 

.9973 

100 

.050 

.043 

.3413 

.49865 

.9973 

400 

.025 

.0217 

.3413 

.49865 

.9973 

1600 

.0125 

.0108 

.3413 

.49865 

.9973 

6400 

.00625 

.0054 

.3413 

.49865 

.9973 




.3413 

.49865 

.9973 


♦ The smaller o-p, the greater the precision (or reliability) of the result. 

t These probability values for small samples are from Student’s Table in Peters and Van 
Voorhis, op. cit., pp. 488-491. 


mous. Both refer to the degree and nature of the variability that is charac¬ 
teristic of the sampling distribution of a statistic. The less the variability, 
the more reliable, or precise, the result. The use of the term reliability goes 
back historically to the treatment of errors of observation and measurement 
in physics, psychophysics, etc. The less the error of observation or of measure¬ 
ment, the more reliable the result; and the greater the error, the more unre¬ 
liable the result. As in sampling statistics, the effect of chance factors on 
observation and measurement is measured in terms of the standard error of 
the statistic (or in terms of its Probable Error)X* 

From Table 12:3 we can see not only the effect of the size of random samples 
on the precision or reliability of a result, but also what size of such a sample 

t For the normal distributions of large sample theory, the probable error, often referred 
to as P.E., is equal to .6745, or about two-thiids, of the standard error, and sets the limits 
of 25% of the probabilities above or below the mean. 

























PRECISION OF RESULTS AND SAMPLE SIZE 


357 


is required to obtain a given degree of precision for a universe whose parameter 
proportion is taken as either .50 or .75. Furthermore, the variability of the 
sampling distributions of parameter proportions ^ .50 is less than that for 
Pa = .50. Thus, the standard errors for P* = .50 of column 2 in the table 
represent maximum values of (Xp for given sizes of samples. In other words, 
CTp for ph 9 ^ .50 is less than a-p for ph = .50. 

The precision, or reliability, of sample results for different sizes of samples 
is further illustrated by Fig. 12:4. The vertical line represents the parameter 
value of the measure (i.e., the value of the measure by hypothesis^ or for the 
hypothesis to be tested). The horizontal scales represent only the abscissas 
of the sampling distributions (not the areas) for Ns = 25, 50, 75, 100, 400, 
1600, and 6400, the standard errors of which are given in Table 12:3 for a 
particular measure. The sampling distribution of each is assumed to be normal 
as in large sample theory. The length of each scale is taken as equal to the 
mean plus and minus 3 times the standard error of the measure. The ranges 
of the parameter value of the measure ± la and ± 2a are also differentiated. 

Fig. 12:4 thus serves to show the relative precision of the sample results for 
any statistic whose sampling distribution has, or can be assumed to have, 
the form of the standard, normal probability curve of large sample theory. 
The particular statistic in this illustration is the percentage of p events in 
random samples. Since a percentage is a proportion taken to a base of 100 
instead of 1.0, the standard errors of percentages are equal to the standard 
errors of proportions times 100: 

_ [12:13] 

<7% = lOOVpq/Na Standard error of a 

percentage 

If we draw random samples of 25 cases each from a universe whose param¬ 
eter percentage is 50%, we would in the long run expect about 68% of the 
sample results to yield percentages of p events that would vary from 40% to 
60%, since the range of M%^ ± \a% = 50% ± 10.0% = 40% to 60%. We 
would expect about 95% of the results to vary from 30% to 70%, since this 
is the range of ± 2(r% when Ns = 25 and %a = 50%; and about 99.7% 
to vary from 20% to 80%, since this is the range of ± 3(7%. If the size of 
the random samples is quadrupled to Ns = 100, we would expect the per¬ 
centages of p events per sample to vary from 45% to 55% in about 2/3 of the 
samples (P = .6826); from 40% to 60% in about 96% of them (P = .9544); 
and from 35% to 65% in more than 99% of them (P = .9973). If Ns = 1600, 
the percentage of p events per sample result would be expected to vary, more 
than 99% of the time, from only 46.25% to 53.75%, and if TV, = 6400, from 
only 48.1% to 51.9%. 

Thus the sample values of a statistic (in Fig. 12:4, a percentage of p events) 
become a more precise or reliable measure of the parameter as the size of 
samples is increased. If a sample were infinite in size, the value of a sample 
result would be precisely that of the parameter. 



358 


PROBABILITY AND STATISTICAL INFERENCE 


Fi'g. 12:4. The Vanability in Sample Results to Be Expected for Different-Sized 
Samples of a Universe Whose Parameter Percentage Is Taken as 50%; Measured 
on the Abscissa of the Sampling Distribution in Terms of the Standard Error of the 
Percentage* 


N,= 6400 


48.1% 

51.9% 

Ns-1600 


46.25% 

53.75% 

Nj^AOO 

_ 

42.5% 147.5% 

52.5% 1 57.5% 

45% 

M =inn - 

55% 

35% 40% 45% 

Ns=75 ,_ 

"^5% 60% 65% 

32.6% 38.4% 44,2% 

"^? 8 ^^ 76 % 6 / 14 % 

* 28.7% 35 !^ 42.9% 

57!i% 64.2% 71.3% 

N,=25 



-1 I -, 

20% 30% 40% 50% 60% 70% 80% 


%h 

Expected Variation in Sample Results: 

*The form of the sampling distribution in each case is assumed to be the standard, 
normal probability curve. The range of variation is shown for M% ± l<r%; M% ± 2o’%, 
and M% ± 3<r%. 

The P value of sample results within the range of M% ± l<r% = .6826. 

The P value of sample results within the range of M% ± 2(t% = .9544. 

The P value of sample results within the range of M% ± — .9973. 


What is a likely or unlikely result for a single random sample of a given 
size? As indicated earlier, the answer to tliis question is based on reasonable 
expectancy for a given research situation. Generally, however, we would 




PRECISION OF RESULTS AND SAMPLE SIZE 


359 


consider a single sample result whose percentage of p events is within the 
range of ± 2a% to be likely for the given hypothesis (in Fig. 12:4, for the 
hypothesis that the parameter percentage is 50%). On the other hand, we 
would consider a single sample result whose percentage of p events is beyond 
the range of M%^ ± 2.5 or ±3(r% unlikely for the hypothesis. 

In the following chapters we shall consider further the question of what is 
likely and unlikely for various hypotheses on the basis of chance, and develop 
appropriate Tests of Significance for various types of statistics. 


EXERCISES 

1. Define the concept of probability as used in statistics. 

2. Upon what considerations is a probability ratio based? 

3. How is the fact that a single result has no probability value dealt with in statis¬ 
tical inference? 

4. Under what circumstances is the binomial distribution similar to the normal 
bell-shaped probability distribution of large sample theory? 

5. Define the product and the addition theorems of probability and describe how 
they arc employed in determining the probability of events. 

6. Toss ten pennies fifty times and record the number of heads on each trial. Make a 
histogram of the sampling distribution of the results, and compare the mean 
and standard deviation of the sampling distribution with the theoretical results 
that should be obtained in the long run, on the assumption that the coins are 
fair. 

7. In Exercise 6, what arc the probabilities of obtaining (in the long run) as many 
as 2 heads per trial? At least 7 heads per trial? No more than 3 heads per trial? 

8. What is a Test of Significance? What information is needed in order to make such 
a test? 

9. How are the implications of the normal probability distribution utilized to yield 
a probability estimate for T? 

10. In what sense is there more than one type of “ normal ” sampling distribution? 
Cite several examples of different types and describe the circumstances in which 
they are used. 

11. What is the difference between T and the t statistic? 

12. From the point of view of sampling theory, when is a sample considered small? 

13. What does a sampling distribution that is both skewed and leptokurtic look like? 
Under what circumstances are sampling distributions of this kind obtained? 

14. In what sense does and does not the standard error of a statistic measure the 
adequacy of a result? 

15. What is the relationship between the precision or reliability of a statistic and the 
size of the sample from which it is obtained? 

16. How much larger does a random sample of 150 measurements need to be for the 
precision of the results to be tripled? 



CHAPTER 13 


Hypotheses and Tests of Significance 


A. LIKELIHOOD AND CONFIDENCE CRITERIA 

The usual research situation in the biological and social sciences requires the 
use of sampling and analytical statistics because the parameter values of 
the universes studied are ordinarily unknown and not obtainable. We saw in 
Chapter 11 that the initial problem is designing an investigation so that the 
samples will be adequate. Sample results are then analyzed so that their likely 
implications, i.e., what they signify, can be determined. The preceding 
chapter made it clear that probability theory is essential to this analysis, 
which culminates in what has come to be known in statistical parlance as a 
Test of Significance. 

Postulation of Parameters 

Usually there are no empirically determined parameter values of the uni¬ 
verse to be studied. Furthermore, ordinarily the results of only one or, at 
the most, a few samples are available. Hence, we usually do not have empiri¬ 
cally established sampling distributions of the statistics in which we are 
interested, nor do we have empirical measures of the standard deviations of 
such distributions (the standard errors of a measure). Therefore, as a rule, we 
have no empirically determined probability values for a given kind of result. 

Such a research position thus requires the postulation of parameter values for 
relevant statistical hypotheses. The implications of such hypotheses are tested 
in the light of probability theory and of statistics derived from sample results. 
In order to test the implications of a statistical hypothesis, we have to make 
some assumption about the form of the sampling distribution of the parameter 
of the hypothesis. We can then estimate its standard error, which will serve 
as the basis for determining a relevant probability value. Finally, since the 
theory of probability describes the behavior of an indefinitely prolonged 
series of samples rather than a single sample, we have to evaluate a sample 
result in terms of whether or not it is likely for the hypothesis under con¬ 
sideration. 

Hypotheses Give Direction and Meaning to Research 

The logic of sampling and analytical statistics is identical with the logic of 
experimental science generally, in so far as the relationship between hypotheses 

* Cf. H. A. Larrabee, Reliable Knowledge, Houghton Mifflin, Boston, 1945. 

360 





LIKELIHOOD AND CONFIDENCE CRITERIA 361 

and empirical data is concerned. That is, the logical way to begin any research 
investigation is to start with a hypothesis, and then obtain an appropriate 
sample of data in order to test the implications of the hypothesis. But, it 
may be contended, how can we begin a statistical investigation with a 
hypothesis if our goal is to determine the facts about a situation? Axe we not 
likely to prejudge the empirical character of a result if we begin with a 
hypothesis about it? The answer of course is no, provided we do not permit 
the hypothesis to bias our observations or warp our conclusions. 

We begin with a hypothesis so that a research investigation will not be an 
aimless collection of data. A hypothesis gives direction and meaning to 
research. We then try to obtain relevant facts (or sample data) and determine 
whether or not they are or are not likely for the particular hypothesis. If they 
are not likely, we reject the hypothesis and consider its logical alternatives. 
But if they are likely, we may accept the hypothesis as a tenable proposition 
about the universe studied. However, no amount of sampling and calculation 
will prove the truth of a hypothesis in the strict logical sense of necessary 
inference. Rather, a hypothesis may be found acceptable because (1) its 
rejection is not warranted by the evidence (sample data), and (2) alternative 
hypotheses are not more likely or more acceptable in view of the evidence. 

In the preceding chapter some implications of sampling and probability 
were illustrated for a universe assumed to be equally divided between males 
and females. Let us now consider a more typical example of empirical re¬ 
search—a universe in which the division of males and females is unknown. A 
merchant wishes to find out what percentage of his customers are men. Let us 
assume that we can obtain a random sample consisting of 1000 of his customers, 
and that 52% are men. The parameter value of the percentage of men cus¬ 
tomers is unknown. It is this value which the merchant wishes us to estimate 
as accurately as possible from the sample result. 

A sampling distribution of the percentages of men, where TV, = 1000, is 
not available. It is therefore impossible to compute from empirical data the 
standard error of such a distribution. Consequently, it is also impossible to 
calculate an empirical probability value for the distribution of the sexes in 
the universe in question. Under these circumstances, what can we do? 

If we have a random sample of 1000 customers—and tliis was assumed to 
be the case—then we can estimate likely parameter values of the percentage 
of men customers. Although such an estimate cannot be exact, it will indicate 
a range of likely parameter values. The smaller this range, the more reliable 
the estimate will be. 


The Probability Estimate 

Let us proceed by postulating an even division of the sexes among the 
merchant’s customers. This is our initial statistical hypothesis. In effect, 
it states that 50% of all his customers are men. The parameter percentage is 
taken by hypothesis as 50. We can assume that a sampling distribution con- 



362 


HYPOTHESES AND TESTS OF SIGNIFICANCE 


sisting of 1000 cases per random sample would be normally distributed for 
this particular hypothesis. The best estimate of the standard error of such 

a sampling distribution of percentages is equal to 100 where p is the 

postulated proportion of men, q is equal to 1.0 — p, and iV« is the size of the 
sample. The standard error is therefore: 


<r% = 100 


/(.50)(.50) 


= 100(.0158) = 1.58% or 1.6% 


By means of this measure we can estimate the expected variability 
(probability) of sample results above and below the postulated parameter 

percentage of 50. For ex- 

Fig. 13:1. Sampling Distribution for Parameter Per- probabilities 

centage of 50% (Where the Total Area = 1.0 and approximately .16 

N, 1000) such ran- 

dom samples will in the 
/ t \ long run yield a percent- 

/ I I \ Expectancy of Sample <>£ males equal to or 
/ \ Results Equal To or greater than 51.6%, be- 

/ it ^Tages^orsK^r"' ^ause the postulated pa- 
.50 I ■^.34 rameter value (50%) plus 

^ a value one standard de- 

3 2 ^ cL ^ ^ viation above the mean 

of the sampling distribu¬ 
tion, viz. 1.6%, is equal to 51.6%. In a normally distributed sampling dis¬ 
tribution of large sample theory, with a total area equal to 1.00 (or unity), 
.34 of the results of random samples should yield proportions of males between 
the parameter percentage of 50 and a sample result one standard error above 
the mean of the sampling 

distribution (in this case, 13:2. Sampling Distribution for Parameter 

51.6%). There remains Percentage of 50% with the Test Ratio of the Test of 
.16 of such samples that Significance Equal to 1.25 (Where the Total Area = 1.0 
will in the long run yield ~ 1000) 

percentage values equal 
to or greater than 51.6%. 

This is illustrated in 
Fig. 13:1. 

The actual sample re¬ 
sult yielded 52% males. 

Is this a percentage which 
we would consider likely ^ ^ 2 o% ^ ^ 

to occur in a single ran- ^ 

dom sample from a universe whose parameter percentage is 50? In order to 
answer this question, we must first estimate the probability of a sample result 


Expectancy of Sample 
Results Equal To or 
Greater Than Percen¬ 
tages of 52% 




LIKELIHOOD AND CONFIDENCE CRITERIA 


363 


as great as 52% for the hypothesis under consideration. To do this, we locate 
the distance (in terms of the standard deviation of the sampling distribution) 
of the sample percentage from the parameter percentage on the normal proba¬ 
bility curve, and then determine the proportion of the area (probabilities) 
above this point. This has been done in Fig. 13:2, and from it we see that a 
sample result of 52% males is 1.25 standard deviations above the postulated 
parameter of 50% males. Referring to Table I, Appendix B, for the distribu¬ 
tion of the normal probability function, we find that the proportion of the 
area between the mean and a distance 1.25 standai'd deviations above it is 
.39. The tail of the distribution above this point therefore includes the 
difference between .50 and .39, or .11 of the total area. Consequently, the 
probabilities are .11, or 11 in 100, that in the long run random samples of 
1000 cases each, drawn from the universe of our statistical hypothesis, will 
yield percentages of males equal to or greater than 52%. This value of .11 is 
the probability value needed for a Test of Significance. 

The Test of Significance and the Test Ratio (T) 

Since a Test of Significance is equal to the following ratio: 



o-s 


where s is the sample value of a statistic, h the parameter value of the statis¬ 
tical hypothesis to be tested, and <t the standard error of the measure (or 
statistic) under consideration, 7" for the preceding data is as follows: 

^ _ 52.% -- 50.% ^ 2^0% ^ ^ 25 

1 . 6 % 1 . 6 % 

As already implied in the preceding paragraph, when T = 1.25, P = .11. 
This means that there arc approximately 11 chances in 100 of obtaining 
sample results equal to or greater than 52% for the hypothesis under considera¬ 
tion, i.e., a parameter value of 50%. (Cf. Table II, Appendix B, for probability 
values of T ratios of from zero to 3.0.) 

We now need to evaluate the significance of this result, whose T ratio is 
1.25 and whose P value is .11. We must decide whether the sample result is 
or is not likely for the hypothesis tested. This is the problem of likelihood. 
In dealing with it, we shall require confidence criteria, on the basis of which 
we may reject or not reject a statistical hypothesis in the light of the T ratio 
obtained from the Test of Significance. 

We have already emphasized that the theory of probability describes the 
relative frequency of occurrence of an indefinitely prolonged series of sample 
results and that a single sample result has no probability value. In the long 
run we would expect that approximately 11 out of every 100 random samples, 
where iV, = 1000, would yield percentages of males equal to or greater than 
52% for the universe of the hypothesis (parameter = 50%). What we need 



364 


HYPOTHESES AND TESTS OF SIGNIFICANCE 


to do is to decide whether our particular sample result is or is not likely, on 
the basis of random errors in sampling and measurement for a universe whose 
parameter percentage is 50. In making this decision, we have no absolute 
principles that are universally valid in all kinds of statistical situations to 
guide us. Consequently, we usually employ confidence criteria which have 
been generally found satisfactory in similar investigations. What should our 
criteria be, on the basis of this experience? 

Likelihood and Confidence Criteria * 

There is general agreement that when the P value of T is equal to or greater 
than .10, the statistic of the sample result can confidently be considered as 
a likely result for the hypothesis tested, unless on other grounds there is a 
strong reason to reject the hypothesis. If we apply this criterion of ^ .10 
to our example, which yielded a P value of .11, we can confidently conclude 
that the merchant’s sample result is likely for a universe of male and female 
customers equally divided with respect to each other. In other words, we 
cannot with confidence reject the hypothesis that 50% of his customers are 
males. The sample result of 52% m^Jes is too likely for the hypothesis tested 
to warrant rejection of the hypothesis. 

Many investigators take a P value of .05 as the limiting confidence cri¬ 
terion in evaluating the likelihood of a sample result for a hypothesis. In 
other words, if a Test of Significance yields a T ratio for which the estimated 
probability value is equal to or greater than .05, the sample result is judged 
to be only a chance divergence from the parameter of the hypothesis being 
tested. The difference between s and h would be expected on the basis of 
random errors in sampling and measurement. On the other hand, if P is less 
than .05, the difference is sometimes considered significant. That is, in some 
research situations a sample result is judged to be unlikely for the hypothesis 
if the P value of the T ratio is less than .05 (less than 5 chances in 100). These 
are of course common-sense procedures to be used only when there is no 
strong reason to accept or reject the hypothesis. 

All research investigators agree that when a Test of Significance yields a 
T ratio whose P value is equal to or less than .001 (1 chance in 1000), the 
sample value is unlikely for the statistical hypothesis tested. Some investiga¬ 
tors, however, consider this too rigorous a criterion in many research situa¬ 
tions. A P value of .01 is consequently taken as the limiting confidence cri¬ 
terion in many cases. Thus, if a Test of Significance yields a T ratio whose 
probability value is estimated to be equal to or less than .01 (1 chance in 100), 
the sample result is judged to be unlikely for the hypothesis. But if the P 
value is greater than .01 the sample result is judged to be likely in some cases. 

If we test the hypothesis that the division of males and females among the 

♦ Cf. R. A. Fisher, “Inverse Probability and the Use of Likelihood,** Proceedings of the 
Ccanbridge Philosophical Society^ 28t257-261,1932. 



LIKELIHOOD AND CONFIDENCE CRITERIA 365 

merchant’s customers is 75% males and 25% females, we obtain a value of 
16.4 for r, as follows: 

r- 52%-75% — ^^4 
1.4% - 

where 52% is the percentage of males yielded by the sample of 1000 cus¬ 
tomers; 75% is the parameter value of the hypothesis now being tested; and 
1.4% is the new estimate of the standard error of the sampling distribution 

of this hypothesis ^<7% = 100 == 1-4%^. The minus sign with a 

T value, in this case —16.4, denotes the direction of the value of the statistic 
from the parameter for the hypothesis. Negative T ratios therefore mean that 
the statistic is less in value than the parameter. 

The Test of Significance for this new hypothesis yields a T ratio greater 
than 16. In other words, the sample result is more than 16 standard deviation 
units from the parameter value of 75%, whose sampling distribution is 
assumed to be similar in form to the standard, normal probability distribu¬ 
tion. The table of values for the normal probability integral (Table I, Ap¬ 
pendix B) does not include z (or T) values greater than 5 because the area 
of the curve beyond 5 standard deviation units is only a very small fraction 
of 1%. For this Test of Significance the P value of Ihe T ratio is considerably 
less than .001. Hence we can confidently reject this particular statistical 
hypothesis; that is, we can be confident that the sample result was not derived 
as a random sample from a universe of customers, 75% of whom were males. 

Confidence criteria that are used in research generally are taken within the 
limits of the preceding P values, i.e., P = .05 and P = .001. A P value of 
from .05 to .02 is usually taken as the limiting criterion for results judged to 
be likely for the hypothesis tested. On the other hand, a P value of from .01 
to .001 is generally taken as the limiting criterion for results judged to be 
unlikely for the hypothesis tested. A P value of .05 is characterized as the 
5% confidence level; a P value of .02 as the 2% confidence level, etc.* These 
criteria are sometimes referred to as Coefficients of Risk.f 

In view of the foregoing confidence criteria. Tests of Significance which 
yield P values of between .02 and .01 may warrant only a tentative or doubtful 
inference. Thus, if a Test of Significance yields a T ratio whose P value is 
.015, we might consider the implications of the result doubtful. We might not 
reject the hypothesis with confidence, since P is not less than .01. Nor could 
we, with confidence, conclude that the hypothesis is likely since P is less 
than .02. 

These distinctions may seem to be somewhat arbitrary, and they are, 
emphatically. It is also to be emphasized that the criteria for likely and 

• Ibid, 

t J. G. Smith and A. J. Duncan, Sampling Statistics and Applications^ McGraw-Hill, 
New York, 1945, p. 164. 



366 


HYPOTHESES AND TESTS OF SIGNIFICANCE 


unlikely results snould not be taken as a single point value on a probability 
scale. It would be unsound to take a single confidence criterion, such as 
P = .01, for all types of problems and kinds of data, and then dogmatically 
accept as likely all T ratios yielding P values greater than .01, and reject as 
unlikely all results whose P values are less than .01. Generally it is recommended 
that the confidence criteria for an investigation be set up in advance of the Test of 
Significance, lest the P value of the T ratio bias the selection of the criteria. 

The 5% Confidence Criterion for Likely Results (P ^ .05) 

Bearing in mind the preceding distinctions, we can agree that, in general, 
T ratios whose P values are greater than .05 (5 chances in 100) signify a 
sample result that is too likely for the hypothesis tested to warrant its rejec¬ 
tion with confidence. The 5% criterion is indicative of a result that is about as 
likely as getting all heads in a toss of 4 or 5 coins. Whether or not we also 
decide to employ the 2% confidence criterion depends on the particular 
research situation. 

The 0.1% Confidence Criterion for Unlikely Results (P g .001) 

We can also agree that T ratios whose P values are less than .001 (1 chance 
in 1000) signify a result that is so unlikely for the hypothesis tested as to 
warrant its rejection with confidence. The 0.1% criterion is indicative of a 
result that is about as unlikely as getting all heads in a toss of 10 coins. 
Whether or not we also employ the 1% confidence criterion again depends 
on the particular research situation. 


Confidence Criteria in Terms of T Ratios 

The above confidence criteria for likely and unlikely results are expressed 
in terms that are general for any kind of Test of Significance, because they 
are in terms of the probability value of a result. Once the P value of a result 
is determined for a given statistic, the confidence criteria in terms of P can 
be employed, regardless of the form of the sampling distribution. For Tests 
of Significance that are based on sampling distributions assumed to have the 
form of the standard, normal probability curve, the evaluation of the T ratio 
is often simplified by stating the confidence criteria in terms of T itself. Thus, 
in the literature, a T ratio of 3.0 or more is frequently referred to as a critical 
ratio. Since in large sample theory a T ratio equal to 3.0 has a P value of 
approximately .001, this is indicative of the 0.1% confidence criterion for an 
unlikely result, and hence the hypothesis can be rejected with confidence. 
Reference to Table II, Appendix B, for the normal probability integral shows 
the above to be the case; i.e., when T = 3.0, .49865 of the total area lies 
between the mean and a point three standard deviation units from it; hence, 



UKELIHOOD AND CONFIDENCE CRITERIA 367 

.50 — .49865 of the total area (probabilities) lies beyond 3.0a’. This diiSerence 
is .00135, or approximately .001. 

A T ratio of 3.0 or more can thus be taken as signifying a result that is 
unlikely for the hypothesis and therefore as warranting its rejection with 
confidence. However, a T ratio less than 3.0 does not warrant the inference 
that the result is likely for the hypothesis. As already stated, confidence 
criteria cannot logically be taken in terms of a given point P value that 
sharply divides likely from unlikely results. 

The T ratio equivalent of the 5% confidence criterion can readily be ob¬ 
tained for normal sampling distributions of large sample theory by means 
of Table I, Appendix B. A point on the abscissa of the normal probability 
distribution that cuts the area into two parts—viz., 95% and 5%—will be 
1.65 standard deviation units from the mean because .45 of the total area 
lies between the mean and a point 1.650* from it. Hence, a T ratio of 1.65 or 
less can generally be taken to signify a result that is likely for the hypothesis 
tested. 

The T ratio equivalent of the 5% confidence criterion may also be equal 
to approximately 2.0. This will be the case when the probability of sample 
results for the parameter value of a given hypothesis is considered with 
respect to results at both tails of the sampling distribution. Thus, .475 of 
the total area of the normal probability distribution lies between the mean 
and a point 1.960- above or below it. Hence 5% of the probabilities lie beyond 
±1.96, or approximately ±2.0a. For example, the limits of likely results 
for the hypothesis that the merchant’s customers consist of 50% males would, 
by this 5% criterion, be 50% ± 2.0a*%. When iV, = 1000, was found to be 
1.6%. Hence 50% ± 2.0(1.6%) gives 46.8% and 53.2% as the limits of 
likely sample results for the hypothesis in question. On the other hand, 
50% ± 3.0(1.6%) gives limits beyond which sample results would be unlikely 
for the hypothesis. These limits are 45.2% and 54.8%. A sample result of 
52% males would thus be likely for the hypothesis, as earlier indicated. 

The corresponding T ratio values of other percentage confidence criteria 
can be readily determined by means of the table of the normal probability 
function, provided of course that the sampling distributions for the statistics 
in question can be assumed to be similar in form to the standard, normal 
probability curve of large sample theory. The T value of the 1% confidence 
level is approximately 2.5. 

T ratios of 2.0, 2.5, and 3.0 are thus convenient values for confidence criteria 
in terms of T. Although these latter T ratios do not precisely correspond to 
the 5%, 1%, and 0.1% confidence criteria respectively, they are widely used 
in sampling statistics because they are close enough for all practical purposes; 
furthermore, they are no more arbitrary than the percentage criteria. Since 
T ratios of 2.0, 2.5, and 3.0 are rounded values, they serve as convenient 
reference points for evaluating the results of Tests of Significance. 



368 


HYPOTHESES AND TESTS OF SIGNIFICANCE 


B. CONFIDENCE LIMITS: TESTING A CONTINUUM OF 
HYPOTHESES 


Many Statistical Hypotheses Can Be Tested 


Let us now return to the merchant who wishes to determine what per¬ 
centage of his customers are men. We have already found that a random 
sample of 1000 customers which yielded 52% men is a likely result for the hy¬ 
pothesis that 50% are men. We also found that we could confidently reject the 
hypothesis that 75% of the customers are men. The two Tests of Significance 
we used led to one hypothesis which is tenable and another which is untenable. 
Are we warranted in concluding that 50% of the customers are men? We are 
not, because, as we shall shortly see, there are other tenable hypotheses, 
other hypotheses with parameter values which, for random samples of 1000 
cases each, could on the basis of chance alone yield 52% men. 

For example, the hypothesis of 51% men would be just as tenable as the 
hypothesis of 50% men. A Test of Significance for this new hypothesis is as 
follows: 


52% - 51% 
100 

\ 1000 


1 - 0 % 

1 . 6 % 


.625 or .63 


where 52% is the sample result; 51% is the parameter value of the present 
hypothesis; and the estimate of the standard error of the sampling distribu¬ 
tion for this hypothesis is 1.6%. 

Table I, Appendix B, reveals that approximately 23% of the area of the 
normal probability curve lies between the mean and a point .63 standard 
deviation units above it. The probabilities are therefore 27 in 100 (P equals 
.50 — .23 = .27) of obtaining, on the basis of chance alone, a sample result 
of at least 52% men from a universe of 51% men. Since the T ratio is less 
than 1.65 (and its P value is greater than .05), we cannot reject this hypothesis. 
Similarly, the hypothesis that 53% of the merchant’s customers are men 
would be tenable. 

We do not need to limit our hypotheses to percentages that differ by 1%. 
We could test the hypothesis that 52.5%, or that 52.189%, of the customers 
are men. Although such fractionation of parameter values is hardly relevant, 
the point remains that we can test practically an infinite number of different 
statistical hypotheses between the limits of a zero percentage and a per¬ 
centage of 100, the hypotheses being different in the sense that their parameter 
values differ. In attempting to establish a definitive answer to the merchant’s 
question, we shall set up the limiting values of those hypotheses which we 
can judge as likely or tenable, as well as limits for those which we can definitely 
reject as unlikely or untenable. The limits of tenable or likely hypotheses 
are thus the limits of a continuum of possible hypotheses; and, similarly, 
the limits for unlikely or untenable hypotheses mark off two continua of 



CONFIDENCE UMIT& TESTING A CONTINUUM OF HYPOTHESES 369 


hypotheses which we can rejeci with confidence. Between these two sets of 
limits, we have two continue of doubtful or tentative hypotheses. (See 
Fig. 13:3.) 

Fig. 13:3. Sampling Distributions for a Continuum of Hypotheses with Confidence 
Limits for Unlikely Hypotheses Taken at ± 2.5(7 

Unlikely Hypotheses 1 (?) \ _ likely Hypotheses _t (?) \ Unlikely Hypotheses 



-2.5^ -2.0a -l.bor 0 +1.6a +2.0ff +2'5a 

-48% 48.8% 50.4% 52% 53.6% 55.2^6% 

———Confidence Limits of -2.5a---^ 

ax=1.6% 

The basic point to be remembered is this: We cannot establish the exact 
value of a parameter from the study of only sample results. However, we can 
establish a range of possible or likely parameter values, and a range of unlikely 
or untenable parameter values. The narrower the range of likely parameter 
values, the more precise (or reliable) the result will be. 

Confidence criteria are used to establish the limits of likely hypotheses for 
a given research problem. Thus, we may take a T ratio of 2.0 (or a proba¬ 
bility value of .05) as the criterion for the limits of tenable hypotheses, and a 
T ratio of 2.5 (or a probability value of .01) as the criterion for the limits of 
unlikely or untenable hypotheses. These limits are illustrated in Fig. 13:3. 
The limiting parameter values of tenable or likely hypotheses are readily ob¬ 
tained, since they are equal to the value of the sample percentage, 52%, plus 
and minus twice the standard error of the percentage. The standard error is 
thus equal to 



and the sample value of 52% ± 2.0 (<r%) is ecpial to 

52% ± 2.0(1.6%) = 48.8% and 55.2% 

These are the limits of likely hypotheses when a T ratio criterion of 2.0 is 
employed. Thus the sample result of 52% would yield a T ratio of 2.0 or less 
for the hypothesis that the parameter percentage is 48.8% (the lower limit), 
as well as for the hypothesis that it is 55.2% (the upper limit). 



370 


HYPOTHESES AND TESTS OF SIGNIFICANCE 


We are now prepared to answer the merchant’s question. It appears likely 
that between 49% and 55% of his customers are men. However, we can be 
much more confident in our report if we use the criteria for unlikely, rather 
than likely, results. This is the case because the possible parameter value 
will lie in the continuum between those values that can confidently be rejected^ 
rather than within the narrower range of “likely” hypotheses. If we use a T 
ratio of 2.5 as the criterion for unlikely results, we have the following: 

52% ± 2.5(1.6%) = 48% and 56% 

These are the limiting values of hypotheses which we can reject with con¬ 
fidence as untenable. In other words, it is unlikely that the random sample 
of 1000 customers came from a universe of customers of which more than 
56% or less than 48% were men. Conversely, these values may also be used 
in this particular problem as the limits of tenable or likely hypotheses. They 
are in fact limits in which we can have even greater confidence than the 49% 
and 55% limits already cited. This is so because we are allowing for a wider 
margin of error when we report 48% and 56% to the merchant as the most 
likely range within which his men customers are to be found. Market-wise, 
i.e., in so far as his buying, selling, and advertising policies are concerned, 
these results are sufficiently precise for the merchant to assume that 50% of 
his customers are men. However, it should be emphasized that any attempt 
on his part to “forecast” the/u/ure percentages of his men customers will 
result in headaches unless the conditions obtaining while his random sample 
was drawn continue to operate. Since he usually cannot be sure about this, 
the best thing for him to do is to sample his customers periodically. By such 
successive sampling and with care in differentiating chance differences from 
real differences, he can determine the trend in the sex of his customers. 

Let us apply the criteria we have developed for evaluating the significance 
of the result in another example. We shall assume that a second merchant 
takes a random sample of 1000 of his customers and finds that 70% are 
women. What are the limits for the hypotheses which we can confidently 
reject? The standard error for the continuum of hypotheses is computed as 
follows: 



and 

70% ± 2.5(L4%) = 66.5% and 73.5% 

These are the limiting values for hypotheses that are definitely untenable. 
We can be confident that this merchant obtained his random sample from a 
universe of customers at least 66% but not more than 74% of whom are 
women. In other words, we can be confident that from about 2/3 to nearly 
3/4 of his customers are women. 



371 


SUAAAAARY OF STEPS FOR THE TESTING OF HYPOTHESES 

Fiducial Limits and Confidence Limits 

R, A. Fisher has employed the concept fiducial limits to characterize the 
limits for unlikely hypotheses. However, we prefer the phrase confidence limits 
because it is more descriptive of what they represent. 

We have already said that the nature of scientific method is such that 
hypotheses can never be completely verified in a strict logical sense. But it is 
of course possible to establish likely hypotheses through the refutation of 
unlikely ones. Giving the facts a chance to nullify a hypothesis is of the 
essence of a Test of Significance. 

The Reliability of a Statistic 

The confidence or fiducial limits of a statistic are often interpreted as setting 
the limits of the “reliability** of a sample result. A measure is the more 
reliable, the smaller the range of its confidence limits. Since, in random 
sampling, errors of sampling decrease as the size of the sample is increased, 
large random samples yield results whose confidence limits are indicative of 
a fairly precise result. It is again emphasized, however, that simply inoreasing 
the size of a sample does not necessarily increase the adequacy of the result. 
Only if the sample is a random or stratified-random sample of the universe 
studied can we have full confidence in the results of any Tests of Significance. 

C, SUMMARY OF STEPS FOR THE TESTING OF HYPOTHESES 

1. The formulation of a general hypothesis. (In a research investigation, 
a hypothesis more often takes the form of a question, or the statement of a 
problem. However, the formulation of a precise statistical hypothesis (Step 6) 
is eventually necessary in order that a Test of Significance may be set up and 
the empirical data be permitted to reveal their implications.) 

2. The definition of the statistical universe to be studied. (This is deter¬ 
mined in part from the formulation of the problem in Step 1; in the example 
used above, however, the investigator has to decide whether the merchant*s 
“universe** will be drawn from customers over a period of only a few weeks 
or of months, etc.) 

3. The designing of a research investigation in such a way that an adequate 
sample of data will be obtained for the universe to be investigated. (To be 
adequate, a method of sampling that will yield a random or a stratified- 
random sample must be used.) 

4. The enumeration or measurement of the characteristics of the sample 
that are relevant to the investigation (in the above example, the counting of 
men and of all other customers in the sample). 

5. The statistical organization and summarization of the sample data 
obtained in Step 4, by means of appropriate methods of descriptive statistics 
(the computation of the percentage of men customers). 



372 


HYPOTHESES AND TESTS OF SIGNIFICANCE 


6. The selection of one or more relevant statistical hypotheses. (This step 
consists in postulating the value of at least one parameter for the universe 
studied—^for example, a proportion of .50, or a mean I.Q. of 100, or a differ¬ 
ence of zero between two means. The choice of statistical hypotheses relevant 
to the general hypothesis or problem of an investigation is closely related to 
Step 1.) 

7. The formulation of a Test of Significance, viz., 





where s is the value of a statistic derived from the sample result; h is the 
parameter value for the hypothesis to be tested; and o', is the standard error 
of the statistic. 

8. The computation of the standard error of the statistic (or statistics) 
obtained in Step 5, by means of the appropriate formula for the standard 
deviation of the hypothetical sampling distribution of the statistic. (This 
step, as we have seen, is necessarily based on some assumption about the 
form of the sampling distribution of the statistic.) * 

9. The computation of the Test Ratio, T, from the Test of Significance. 

10. The estimation, from the T ratio, of the probability of a given result. 
(In the case of statistics whose sampling distribution can be assumed to be 
similar in form to that of the standard, normal probability curve, the differ¬ 
entiation of the area of this curve for T, given in Table II, Appendix B, is 
the basis for this estimate. For small samples, the Student-Fisher table of 
probability values for the t ratio (Table III, Appendix B) is used. For hypoth¬ 
eses concerning the distribution of frequencies, the method of chi-square in 
Chapter 15, and the corresponding probability values in Table IV, Appendix B, 
may be used.) 

11. An inference whether the sample result is likely or unlikely for the 
hypothesis tested, i.e., whether or not in the light of experience and all rele¬ 
vant evidence, we judge that it will or will not occur. (This inference is usually 
based upon confidence criteria which should be set up before the result is 
actually obtained.) 

12. A conclusion, formulated in the light of Step 11, concerning the gen¬ 
eral hypothesis or problem of the investigation. 

In many research problems, as in the examples used above, the procedures 
from Step 6 on are short-cut by establishing confidence (or fiducial) limits 
that indicate the precision of the result. However, it should be emphasized 

* Most of the measurements of standard errors in large sample theory are developed 
on the assumption that the form of the theoretical sampling distribution is similar to the 
standard, normal probability curve. Certain exceptions to this were noted in Chapter 12; 
however, when the statistic under consideration is derived from a sample distribution of 
measurements whose form does not differ significantly (in kurtosis and skewness) from 
that of the standard, normal probability curve, we are warranted in assuming that the 
form of the sampling distribution of the statistic is ** normal.’* 



TESTS OF SIGNIFICANCE FOR PERCENTAGES 373 

that this short-cut implies many Tests of Significance of a continuum of 
hypotheses. 


D. TESTS OF SIGNIFICANCE FOR SOME COMMONLY 
USED STATISTICS 

In the preceding sections we have seen that a Test of Significance requires 
three types of values: 

1. A statistic derived from the results of a random or a stratified-random 
sample of a universe. 

2. The parameter value of a relevant hypothesis. 

3. A measure or estimate of the standard error of the sampling distribution 
of the statistic under consideration. 

The logic underlying the interpretation of a Test of Significance is similar 
for all types of statistics. However, as indicated earlier, the sampling distri¬ 
butions may have different forms. Furthermore, in the case of chi-square, 
differences in sampling distributions are based on the concept of degrees of 
freedom (d./.), rather than on iV„ the size of the sample (cf. Chapter 15). 

In the remainder of this chapter we shall present standard error formulas, 
and Tests of Significance for commonly used statistics whose sampling dis¬ 
tribution can be assumed to be similar in form to the standard, normal proba¬ 
bility curve. Under these conditions, the probability value of a sample result 
will be based on the differentiation of the normal probability integral in 
Table I, Appendix B, and set up directly for T in Table II of that appendix. 


Tests of Significance for Percentages 

The example used in Sections A and B involved percentage statistics. We 
saw that the estimate of the standard error of the sampling distribution of 


percentages is: 




[13:1] 

Standard error of a 
percentage 


where p is the proportion of measurements or occurrences of a given class, 
and q is, by definition, always equal to 1.0 — p* 

We shall present another Test of Significance for this statistic. Several 
years ago Dr. George Gallup published a survey result which indicated that 
“64% of voters called federal rationing fair despite some grumbling.” f The 
question asked in this poll, whose object was to secure a picture of people’s 
attitudes toward rationing, was: 


DO YOU THINK THE RATIONING OF VARIOUS PRODUCTS IS BEING 
HANDLED FAIRLY? 


* Note that the computation of this standard error is facilitated by reference to Table VII, 
Appendix B. 

t New York Timest January 10, 1943. 



374 


HYPOTHESES AND TESTS OF SIGNIFICANCE - 


The following results were reported* 


Yes 

64% 

No 

29% 

No Opinion 

7% 


100% 


One question to be asked about these results is whether it is likely that a 
majority of the voters at the time thought rationing was being handled fairly. 
Since 64% of the sample answered ‘‘yes,** it is possible that at least 51% (a 
simple majority) of the population would have answered likewise. A Test of 
Significance quickly indicates whether this supposition is correct. Since the 
actual size of the sample is not stated in the report, we shall assume that 
was equal to ai least 4000 people. The Test of Significance is, therefore, as 
follows: 


T = 



64% - 51% 

/(.51)(.49) 
100 V -40b0 - 


13% 

0.79% 


16.5 


where 64% is the sample result; 51% is the parameter value of the hypothesis 
tested; .51 is the parameter proportion for the hypothesis (q = .49 signifies 
both those who had no opinion as well as those whose answers were “no”); 
and 4000 is the assumed size of the sample. 

The T ratio yielded by this Test of Significance is 16.5. According to the 
confidence criteria described in Section B, this result is most unlikely for the 
hypothesis tested, for the T ratio is many times greater than 2.5. Since this 
particular hypothesis is untenable, we can conclude that more than a majority 
of the universe of voters believed that the rationing of various products was 
being handled fairly. 

It is also relevant here to determine confidence limits, either for those who 
thought rationing was being handled fairly, or for those who thought it was 
being handled unfairly. Let us consider the latter. We shall need, therefore, 
the limiting values of those hypotheses that can be rejected as definitely 
untenable for a percentage of 29% (the “no’s**). We shall take the confidence 
limits as equal to 29% ± 2,5<7%. Thus: 

29% ± 2.5(lOO= 29 ± 2.5(.72%) = 27.2%and30.8% 

where p « ,29 is the average parameter proportion for the continuum of 
hypotheses used to establish the confidence limits; and q is .71. 

The confidence limits in rounded values are 27% and 31%. Hence we can 
be quite confident that at least 27% but not more than 31% of the universe of 
voters sampled were of the opinion that the rationing of various products 
was not being handled fairly (assuming a properly drawn sample). 



TESTS OF SIGNIFICANCE FOR PROPORTIONS AND FREQUENCIES 375 


Tests of Significance for Proportions 


Tests of Significance for proportion statistics are identical with those for 
percentage statistics, and therefore require no further illustration. The 
standard error of a proportion is as follows: 


= 



[13:2] 

Standard error of a 
proportion 


It is relevant, however, to consider the following problem which frequently 
arises in both proportion and percentage statistics. How large must a sample 
proportion be to indicate a result greater than would be expected on the basis of 
chance for a given hypothesis? Consider, for example, the problem of deter¬ 
mining whether there is a difference in the taste of two cola beverages. If 
samples of the two cola drinks, A and JS, are presented in random order 
to a subject over a period of 25 trials under appropriately controlled experi¬ 
mental conditions, then solely on the basis of chance we would expect the 
subject’s taste judgments to be correct 50% of the time (or an average of 
12.5 correct trials in a series of 25 trials). 

What proportion of correct judgments does a subject have to give in order 
to indicate that he can really taste a difference and is not just guessing? 
If we take the confidence criterion of 3.0 for unlikely hypotheses and restate 


the formula for T: T = ^^ in terms of p„ the required statistic: 

CTp 


then 


Pt — (TpT Ph 


[13:3] 

To determine the value 
of a statistic needed 
for the rejection of a 
hypothesis 


where .50 is the parameter proportion of the hypothesis tested; 25 is the 
number of trials, iV,; and 3.0 is the confidence criterion. Thus, if a subject 
makes .80 or more of his judgments correctly, we can reject the chance 
hypothesis. In a series of 25 trials, .80 is equal to 20 correct judgments; conse¬ 
quently 20 or more correct judgments will warrant the rejection of the chance 
hypothesis and signify that a subject can taste a difference between the two 
beverages. 


Tests of Significance for Frequencies 

Sometimes it is more convenient or desirable to evaluate sample data in 
terms of frequencies than in proportions or percentages. We saw previously 
that the standard error of a frequency is equal to 

[13«4] 
Standard error of a 
frequency 


ff/ =VN,pq 



376 


HYPOTHESES AND TESTS OF SIGNIFICANCE 


We could have employed this formula in the preceding example and directly 
determined the frequency of correct judgments necessary for the rejection of 
the chance hypothesis. Thus, where 

(/.-/0/cr/; fs^(TfT+fH 

Therefore, 

•/. = V25(.50) (.50)3.0 + 12.5 = 20.0 

This result is of course the same as that with Formula 13:3; that is, at least 
20 correct judgments in 25 trials are necessary for us to reject the chance 
hypothesis with confidence (by the criterion of T = 3.0) and warrant the 
conclusion that a subject can really taste a difference. 

An evaluation of sample results in terms of the frequencies of a class of 
events rather than of proportions or percentages is useful at times. Generally, 
however, it is better to convert a frequency to a proportion or percentage. 


Tests of Significance for the Arithmetic Mean 

The standard error of a sampling distribution of means is given by the 


following: 


O-Af 


(Tu 

y/W, 


[13:5] 

Standard error of the 
arithmetic mean 


where is the standard deviation of the measures of the universe from 

which the sample was drawn and, as usual, iV, is the size of the sample. This 

formula ordinarily cannot be used, however, because the standard deviation 

of the universe being sampled is usually not known. Consequently, the 

estimate of the standard error of a mean must be based on the standard 

deviation of the distribution of a sample result. The formula for the standard 

error of a mean therefore becomes r-,. _ 

a _ [13:5a] 

(Tm — — - Y Standard error of the 

^ a ^ arithmetic mean 


where signifies the standard deviation of the distribution of a sample result, 
and iV« is the size of the sample. 

We shall illustrate the use of this formula by means of a random sample of 
300 Stanford-Binet I.Q, scores of high-school sophomores in a large city. The 
mean I.Q. of this sample was found to be 108 and the standard deviation of 
the distribution was found to be 12. 

Inasmuch as the mean I.Q. of an unrestricted universe at this maturity 
level (in the United States) *s assumed to be 100, we may inquire whether the 
result obtained for this sample of high-school sophomores is likely to hold 
for the more general, unrestricted universe. We can readily answer this by 
applying a Test of Significance in which the parameter mean is taken as 100: 
M.-Mh 108 - 100 8 

Vsoo -1 




TESTS OF SIGNIFICANCE FOR THE ARITHMETIC MEAN 377 

where .69 of one I.Q. unit is the estimated standard error of the sampling 
distribution, and the difference between the sample result and the mean of the 
hypothesis tested is 8 I.Q. units. 

Since the Test of Significance yields a T ratio of 11.6, we can be confident 
that the sample mean I.Q., 108, was not derived from a universe whose mean 
I.Q. is 100. In other words, we can be quite certain that the high-school 
sophomores of that particular city have a mean I.Q. higher than 100. 

Confidence limits for the universe of that city’s high-school sophomores 
would be as follows, with a T criterion of 2.5: 

108 ± 2.5(.69) = 106.3 and 109.7 I.Q. scores 

where 108 (the sample mean) is taken as the representative parameter value 
of the hypothetical means of a continuum of hypotheses, and .69 is the 
estimated standard error of the mean. The limits set by the criterion of 2.5 T 
are I.Q.’s of 106.3 and 109.7. We can therefore be confident that the given 
universe of high-school sophomores has a mean I.Q. of at least 106.3 but not 
more than 109.7. 

In using Formula 13:5a, it should be apparent that the subtraction of one 
case from iV« makes very little difference in the computed value of the stand¬ 
ard error when the sample consists of a f airly larg e number of cases. Thus, 
in the preceding example, 12 divided by V300 — 1 is equal to .694, whereas 
12 divided by V300 is equal to .693. Both computations give .69 as the 
standard error of the mean. Consequently, the result of the Test of Signifi¬ 
cance and th e confidence limits would have been the same had v7^ instead 
of ViV« — 1 been used. In practice y/Wi is usually used instead of ViV, — 1 
for samples of 30 or more cases, i.e., samples of large sample theory. 

The Reliability of a Mean 

In an earlier section, we saw that the confidence limits of a statistic are 
often interpreted as setting the limits of the reliability of a measure. If this 
interpretation is applied in the preceding example, the confidence limits of 
106.3 and 109.7 I.Q. units are an index of the reliability of the mean I.Q. for 
the universe sampled. Since a variation of less than 3 I.Q. points is very small 
from a psychological point of view, it follows that the mean of this sample is 
very reliable, because we would expect the result of the random sample to 
have been obtained from a universe with a mean value of between 106.3 and 
109.7. The estimated range of possible parameter values for the universe is 
relatively small. 

In psychological measurement and research in related fields, the reliability 
of a result must usually be judged on a relative basis. That is, there are no 
absolute standards of reliability. What we can do is to evaluate the reliability 
of the result in relation to the practical meaning or implications of the measure¬ 
ments analyzed. 



378 


HYPOTHESES AND TESTS OF SIGNIFICANCE 


Tests of Significance for Test Scores and Other Measures 

It is pcNSsible to test the significance not only of mean results for a sample 
but also of different test scores or measures of a distribution. This is a par¬ 
ticularly useful type of evaluation because of the importance attached to 
particular scores or measures in psychological testing. In fact, confidence 
limits for a test score provide the most practical and meaningful basis for 
evaluating the reliability of a test. If the confidence limits are relatively great, 
the test itself may have little or no empirical usefulness for individual diag¬ 
nosis and prognosis. On the other hand, if they are relatively small, the test 
can have considerable usefulness. 

A Test of Significance for a measure of a distribution can be well illustrated 
by I.Q. scores since the scale of I.Q. scores itself is familiar in psychology. 
First, we need the standard error of a measure (or test score). It is estimated 
as equal to the following: 

^_ D3:6] 

(Tx = (r*v 1 — Standard error of a 

measure 

where the subscript X symbolizes a measure of a variable or test, x; (Tz is 
the standard deviation of the distribution of the variable; and is the 
reliability coefficient of the variable or test (cf. Chapter 17, Section B). 

If we take the reliability of the Stanford-Binet * as r**/ = .91, and the 
variability, <r„ as equal to 15 I.Q. units, then 

<rx = 15Vl - .91 = 15(.30) = 4.5 I.Q. units 

Is a Binet I.Q. score of 105 significantly greater than a mean I.Q. score of 100? 
A Test of Significance will quickly answer this question. 

^_ X.-Xh 105 - 100 

<Tx 4.5 

The T ratio, 1.1, signifies a result that is very likely for the hypothesis tested. 
In other words, an I.Q. score of 105 is not significantly greater than an I.Q. 
of 100 (i.e., 105 would be expected on the basis of chance errors of sampling) 
and consequently the psychometrician would not be warranted in concluding 
that an I.Q. of 105 signifies more Stanford-Binet intelligence than an aver¬ 
age I.Q. of 100. 

The Reliability of Test Scores 

Confidence limits for I.Q. scores, or any other measures of a variable, can 
readily be established by means of Formula 13:6. However, test scores or 
other measures of a variable do not always have the same degree of reliability 
(or standard error) at all points of a scde. Formula 13:6 is most applicable 
to scores around the mean of a distribution. It may yield either too small or 

•Cf. L. M. Terman and M. A. Merrill, Measuring Intelligence^ Houghton Mifflin, 
Boston, 1937, chap. 3. 



TESTS OF SIGNIFICANCE FOR STANDARD DEVIATIONS 


379 


too large a measure of error for extreme values, depending upon the nature 
of a given variable. Thus, Terman and Merrill found that the standeurd error 
of Stanford-Binet I.Q. scores increases as the I.Q. increases; it ranges from 
a (Ti,q, of 2.2 for I.Q.’s below 70 to a (ri.Q. of 5.2 for I.Q.’s of 130 and over, 
with a (ri.Q. of 4.5 for I.Q.’s between 90 and 109. I.Q. scores on the basis of 
which feeble-mindedness or mental deficiency is inferred are therefore con¬ 
siderably more reliable than I.Q. scores that signify superior intelligence. 

Confidence limits in terms of a T ratio criterion of 2.0 for likely hypotheses 
are as follows for a Stanford-Binet I.Q. score of 109: 

Xlq. ± 2.0((rjf) = 109 ± 2.0(45) = 100.0 and 118.0 
Confidence limits in terms of a T ratio criterion of 3.0 are: 

109 ± 3.0(45) = 95.5 and 122.5 

Thus it is likely that persons with an I.Q. score of 109 have parameter 
scores whose values lie between 100.0 and 118.0; and we can be quite confident 
that the parameter values will not be less than 95.5 or greater than 122.5. 

One qualification should be made regarding this interpretation of the 
reliability of a test score, namely, it is made on the assumption that only 
chance errors of sampling and of measurement are responsible for the varia¬ 
tion or difference from the parameter value, whatever it may be. The inter¬ 
pretation does not take into account the constant error factors that might 
affect the I.Q. score positively or negatively, such as coaching in the particu¬ 
lar items (positive bias) or inadequate rapport in the test situation (negative 
bias). 

The reliability of a test score is often expressed in relative terms, viz., the 
ratio of ax to a, the standard deviation of the sample distribution. In the 
case of a Stanford-Binet I.Q. near the mean, this ratio would be 4.5/15.0, or 
approximately 1/3. Thus the effect of chance errors of sampling and measure¬ 
ment on I.Q. scores for this test is about one-third as great as the standard 
deviation of the total distribution. 

The reliability coefficient. Tax', in Formula 13:6 is often obtained by corre¬ 
lating test results from the same sample of individuals on (1) alternate forms 
of a test, or (2) a second administration of it. If the variability of the test re¬ 
sults is different, the best estimate of ax is obtained by taking the average 
of their respective standard deviations: 

[13; 6a] 

0 -^ _|- 0 -^, - Standard error of a 

ax = -^-V 1 — rxx' measure (when two 

variability estimates of 
the test are available) 

Tests of Significance for Standard Deviations 

Occasionally, in research situations, the reliability of the standard devia¬ 
tion of a sample result must be determined. This is usually estimated in terms 



380 


HYPOTHESES AND TESTS OF SIGNIFICANCE 


of the confidence limits for untenable hypotheses. The best estimate of the 
standard oror of a standard deviation is provided by the following formula: 


<r 

V2(iV,) 


-= <r, or .707 OM 
Vn, 


[13:7] 
Standard error of the 
standard deviation 


where a in the numerator is the standard deviation of the distribution of the 
sample result, and TV, is the size of the sample. 

The use of this formula will be illustrated by Brigham’s Army Alpha data, 
published in his Study of American Intelligence. For a sample of 81,465 native- 
born whites drafted in World War I, Brigham obtained a mean mental age 
score, calculated from the Alpha test results, of 13.77 years. The standard 
deviation of this large distribution of mental age scores was 2.86 years. The 
size of the sample and the standard deviation of the result being known, the 
standard error of the group’s variability in mental age scores is equal to the 
following: 

2 86 

(T^r = — 7 =^== = .007 year of mental age 
V2(81,465) 


The standard deviation obtained with the sample result is a precise estimate 
of the variability of the total universe, consisting in this case of all World War I 
white draftees born in the United States. The reliability or precision of the 
result in t^rms of confidence criteria equal to a T ratio of 2.5 is as follows; 

2.86 ± 2.5(.007) = 2.84 and 2.88 years of mental age 

Such a precise result is of course to be expected from so large a sample; 
and, provided the sample was a random sample of the universe studied, we 
can have great confidence in the accuracy of the measure of variability it 
yielded. 


The Standard Error of the Average Deviation 

Although the average deviation is used less frequently than the standard 
deviation as a measure of variability, it is used sufficiently often to make it 
worth while for us to know what the standard error of such a statistic is. It 
is obtained by the following formula, taken in terms of the standard deviation: 

_ 0.603(r [13:8] 

Standard error of the 
• average deviation 

where as usual a is the standard deviation of the distribution of the sample 
result and TV, is the size of the sample. 

The standard error of the average deviation can also be estimated from the 
average deviation. It is equal to the following: 

0.756A.D. 

vw. 


ctad 


[13:8a] 
In terms of A.D. 



TESTS OF SIGNIFICANCE FOR CENTILES 381 

where A.D. is the average deviation of the sample distribution and TV, is t^e 
size of the sample. 


Tests of Significance for Gentiles 

The centile method for descriptive statistics was presented in Chapter 6 
for any type of distribution. When a sampling distribution for any centile 
measure can be assumed to have the form of the standard, normal proba¬ 
bility curve, the measures of standard error to be discussed below can be used 
for Tests of Significance and confidence limits of centile statistics. The 
assumption of normality is warranted if the frequency distribution of the 
sample from which the centile measures are derived tends to be similar to the 
normal, bell-shaped curve. 


Standard Error of Any Centile 

In its implications for a Test of Significance, the standard error of any 
centile is analogous to the standard error of a measure or test score already 
described. However, the measure or test score is now stated in terms of a 
centile. Confidence limits for a centile have some advantage over such limits 
for an original measure of a distribution unless the latter is converted to 
a z score, or unless the value of <rx is stated in terms of the standard deviation 
of the X variable (as it often is). Confidence limits for any centile, however, 
are themselves stated in relative terms of the centile point system. 

The formula for the standard error of any centile is as follows: 

[13:9] 

= 


pg 

Ns 


Standard error of any 
centile 


where <t is the standard deviation of the distribution from which the centile 
measure is derived; y is the ordinate value at the particular centile point on a 
normal distribution (see Table I, Appendix B); p is the proportion of the 
frequencies of the sample distribution which are above or below the particular 
centile point; q is the remaining proportion of the frequencies ( 1.0 — p); and 
Ns is the size of the sample. 

The standard error for either the first or second tercile points of a dis¬ 
tribution with a standard deviation of 20 and Ns equal to 100 , is as follows: 


or 


20 /(.33)(.67) 

.364 \ 100 


= 54.95(.047) = 2.58 


where .364 is obtained from Table I, Appendix B, as the value of y at a point 
that divides the total area of the distribution into 1/3 and 2/3, 1/6 (or .167) 
of the larger part being below the mean. Since crcgg = 2.58, confidence limits 
in terms of T = 2.5 will be: 


Css i 2.5<rcfjj = Czz i 2.5(2.58) = Cas i 6.3 
= C 27 and C 40 



382 


HYPOTHESES AND TESTS OF SIGNIFICANCE 


We can thus be confident that the lower parameter tercile point will not be 
less than an original score equivalent to the 27th centile or greater than a score 
at the 40th centile. 

With the centile method we can obtain Q, the quartile deviation, more 
readily than <r, the standard deviation. If the distribution of the sample result 
is fairly normal (as is assumed to be the case), we can express Formula 13:9 
in terms of Q instead of a, since <r is about 1.5 times larger than Q: a* = 1.483Q. 
But (? is a somewhat less reliable measure of variability than a, and conse¬ 
quently a is preferred. The standard error of any centile with the measure of 
variability in terms of Q instead of a is given by the following: 




1.483Q 

y 



[13:9a] 

Standard error of any 
centile, in terms of Q 


Standard Error of the Median 

The standard error of the median, which is the 50th centile, can be ob¬ 
tained from Formula 13:9; but since the values of y, p, and q are always 
constant, the formula can be simplified to the following: 


Or, in terms of Q, 


cr /(.50)(.50) 1.253(r 

ffMdn (,359 yj 


[13 s 10] 
Standard error of the 
median 


1.253(1.483Q) 1.858Q 

viv. " vw. 


[13:10a] 
In terms of Q 


It will be observed in Formula 13:10 that the standard error of the median 
is about 25% larger than the standard error of the mean. This confirms 
statistically what was pointed out earlier, viz., that the mean is a more reli¬ 
able measure of central tendency than the median. 

Tests of Significance and confidence limits for a median are based essen¬ 
tially on the same logical considerations as those for means, and hence will 
not be discussed here. 


Standard Errors of Qu Qs, Di, and D 9 

The following standard error formulas for the first and third quartile points 
and the first and ninth decile points are commonly used: 


^ /(.25)(.7S) 1.362<r [13:11] 

or ‘ — Q 3 jg'V jy ~ a/aT Standard error of quar- 

= ffc„J ■ * tiles 


<^01 

or 


1.362(1.483<?) 


2.020Q 


[13:11a] 
In termsof Q 



TESTS OF SIGNIFICANCE FOR CENTILES 


383 


The standard errors of Cio and Cm are: 

a /(.10)(.90) 1.709<r 

0.1755 V N, VW: 

I 1.709(1.483c) 2.534Q 


<r2)i = (TCio 

or I 

<ri>, = crceo J 


[13:12] 

Standard error of A 
and A 


[13:12a] 
In terms of Q 


Standard Errors of Centik Measures of Variability 

Tests of Significance and confidence limits for the quartile deviation (Q), 
the tercile deviation (T.D.), and the Cio to Cgo range (D) are based on the 
same logic as those for the standard error of the standard deviation. The 


standard error formulas for each, stated in terms of both a and Q, are given 
simply for reference. 

The standard error of the quartile deviation, Q, is given by the following: 

.787(r 

~ vw. 

[13:13] 

Standard error of the 
quartile deviation, Q 

.787(1.4830) 1.1670 

Vn. ~ Vn. 

[13:13a] 
In terms of Q 

The standard error of the tercile deviation, T.D., is: 


648(r 
= VN. 

[13:14] 

Standard error of the 
tercile deviation 

.648(1.4830) .9610 

~ VW. ~ vw. 

[13:14a] 
In terms ofQ 

.648(2.3177’.D.) 1.50ir.D. 

VW. VW. 

[13:14b] 
In terms of T,D. 

The standard error of the D range is: 


2.279(r 

VD = ,- 

Vn. 

[13:15] 

Standard error of the 
D range 

2.279(1.4830) 3.3800 

“ VW, vw. 

[13:15a] 
In terms of Q 

.mo 

[13:15b] 
In terms of D 


* This has been derived from the general formula for the standcnd error of a range. 
Cf. C. C. Peters and W. R. Van Voorhis, Statistical Procedures and Their Mathematical 
Bases, McGraw-Hill, New York, 1940, pp. 148-150. 



384 


HYPOTHESES AND TESTS OF SIGNIFICANCE 


Tests of Significance for Product-Moment Correlation Coefficients 


An estimate of the standard error of product-moment correlation coeffi¬ 
cients is best obtained by converting r to Fisher’s z function, to be presented 
later. However, for small values of r a satisfactory estimate can be obtained 
with the following: P 3 . 

^ Standard error of prod- 

V/Va net-moment correla¬ 

tion 


<Tr = 


where tk is equal to the parameter value of r for the hypothesis tested, and 
j?V„ as usual, is the size of the sample. 


The Hypothesis That r Equals Zero {The Null Hypothesis) 

Fortunately, Formula 13:16 gives a very good estimate for testing the 
hypothesis that the correlation coefficient for a universe is zero. The sampling 
distribution for this hypothesis can be assumed to be normally distributed 
as long as the samples consist of 25 or more cases. 

The hypothesis that rn is zero illustrates an implication of the concept of the 
null hypothesis. Some investigators classify as null hypotheses all those whose 
parameter values are taken as equal to zero. More generally, however, a null 
hypothesis is defined as one that represents what would be expected under 
fortuitous or chance conditions. The null hypothesis is important in correla¬ 
tion because a Test of Significance permits a decision as to whether or not 
the sample correlation coefficient could have been obtained from a universe 
whose parameter r is equal to zero. If this hypothesis cannot be rejected, it 
follows that the correlation coefficient is likely to occur on the basis of chance 
alone. Hence, we cannot conclude that there are any determining factors, 
other than chance factors, underlying the correlation between two variables. 
On the other hand, if the null hypothesis can be rejected, we can conclude 
that extra-chance factors account for at least some of the correlation in the 
sample result. It should be emphasized, however, that the statistical test in 
itself does not provide any information as to the nature of such factors. 

We shall illustrate a Test of Significance for the null hypothesis with the 
data in Fig. 8:1, consisting of 151 pairs of height and weight measures for a 
sample of one-year-old girls. The product-moment correlation coefficient was 
found to be equal to .67. From the point of view of sampling statistics, the 
question arises as to whether this result can be interpreted as signifying a corre¬ 
lation between height and weight which can be attributable to other than 
purely chance factors. The Test of Significance for the hypothesis that the 
sample result is derived from a universe whose parameter r is equal to zero 
is as follows: 

.67 - 0 .67 .67 „ ^ 

(1-0*) 1 .08 


T 



TESTS OF SIGNIFICANCE FOR PRODUa-MOMENT 385 

where .67 is the sample r, 0 is the parameter r of the hypothesis, 151 is the 
size of the sample, and 8.4 is the test ratio, T.* 



This Test of Significance yields a T ratio of 8.4. Consequently, in accordance 
with the criterion of a T ratio of 2.5 for a non-chance result, we can definitely 
reject the null hypothesis. It is most unlikely that the relationship between 
height and weight observed in this sample of 151 cases can be explained as 
being due to the operation of purely chance factors. It will have to be ex¬ 
plained in terms of extra-chance factors. What these are depends on the 
nature of the data. In the case of these height-weight measurements, these 
factors are doubtless inherent in infant development, both height and weight 
being aspects of organic growth. 

The Null Hypothesis and Significance 

When a Test of Significance for the null hypothesis yields a T ratio of less 
than 2.5, the sample coefficient is sometimes described as “insignificant.” It is 
important in such cases to recognize that this use of the concept “insignificant” 
means that the sample result is not reliably greater than zero. The implications 
of this conclusion may, however, have great significance with respect to a 
particular research problem or field of scientific inquiry. Thus, the fact that 
no correlation significantly greater than zero has been found between intelli¬ 
gence and the shape of people’s heads is of considerable significance in psycho¬ 
logical theory. The failure to find any correlation between such attributes 
does not in itself disprove phrenological hypotheses with absolute finality, 
but it does discredit them and throws the burden of proof on those who sup¬ 
port such hypotheses. 

Testing Other Hypotheses for r 

The null hypothesis is by no means the only relevant hypothesis for many 
sample product-moment correlation coefficients. It is, however, the first to 
be tested because there is no point to testing further hypotheses unless the 
result of a Test of Significance for the null hypothesis warrants its rejection. 
After all, if the null hypothesis cannot be rejected, we cannot proceed on the 
assumption that there is any significantly greater correlation between the 
bi-variates than would be expected on the basis of chance alone. 

When the rejection of the null hypothesis is warranted, we can then test 
the hypothesis that the sample result was derived from a universe with a 
parameter r of a definite value. A coefficient of .866 (or .87), for example, 
indicates a degree of correlation which in its predictive value is halfway 
between no predictive value (zero correlation) and perfect prediction (when r 

• The standard error of r for the null hypothesis is the reciprocal of the square root of 
the size of the sample. The subtraction of one case from TV, is unnecessary when N, is much 
greater than 30. See Table I, Appendix C, for square roots and reciprocals. 



386 


HYPOTHESES AND TESTS OF SIGNIFICANCE 


equals 1.00). Is it at all likely that the correlation coefficient for the sample of 
height and weight measures could have been obtained from a universe whose 
r is .866? 

A Test of Significance for this problem cannot be based on the standard, 
normal probability function, because the sampling distributions of correla¬ 
tion coefficients of .67 or more become increasingly skewed as the size of the 
coefficient increases. R. A. Fisher has developed a transformation function 
for these high values of r which is approximately normally distributed for 
any value of r. This transformation function is called z and its formula is as 
follows: 

[13:17] 

z = (1 + r) — loge (1 — r)] Fisher’s z transforma¬ 

tion function for r 

where r is the parameter value of the hypothesis to be tested.* The conversion 
of values of r to z and of z to r is facilitated by Table 13:1. 


Table 13:1. Values of Fisher’s z Function for Given Values of r f 


r 

z 

r 

z 

r 

z 

r 

z 

.00 

.00 

.25 

.26 

.50 

.55 

75 

.97 

.01 

.01 

.26 

.27 

.51 

.56 

.76 

1.00 

.02 

.02 

.27 

.28 

.52 

.58 

.77 

1.02 

.03 

.03 

.28 

.29 

.53 

.59 

.78 

1.05 

.04 

.04 

.29 

.30 

.54 

.60 

.79 

1.07 

.05 

.05 

.30 

.31 

.55 

.62 

.80 

1.10 

.06 

.06 

.31 

.32 

.56 

.63 

.81 

1.13 

.07 

.07 

.32 

.33 

.57 

.65 

.82 

1.16 

.08 

.08 

.33 

.34 

.58 

.66 

.83 

1.19 

.09 

.09 

.34 

.35 

.59 

.68 

.84 

1.22 

.10 

.10 

.35 

.37 

.60 

.69 

.85 

1.26 

.11 

.11 

.36 

.38 

.61 

.71 

.86 

1.29 

.12 

.12 

.37 

.39 

.62 

.73 

.87 

1.33 

.13 

.13 

.38 

.40 

.63 

.74 

.88 

1.38 

.14 

.14 

.39 

.41 

.64 

.76 

.89 

1.42 

.15 

.15 

.40 

.42 

.65 

.78 

.90 

1.47 

.16 

.16 

.41 

.44 

.66 

.79 

.91 

1.53 

.17 

.17 

A 2 

.45 

.67 

.81 

.92 

1.59 

.18 

.18 

.43 

.46 

.68 

.83 

.93 

1.66 

.19 

.19 

.44 

.47 

.69 

.85 

.94 

1.74 

.20 

.20 

.45 

.48 

.70 

.87 

.95 

1.83 

.21 

.21 

.46 

.50 

.71 

.89 

.96 

1.95 

.22 

.22 

.47 

.51 

.72 

.91 

.97 

2.09 

.23 

.23 

.48 

.52 

.73 

.93 

.98 

2.30 

.24 

.24 

.49 

.54 

.74 

.95 

.99 

2.65 


♦ R. A. Fisher, Siati$iical Methods for Research Workers, Oliver & Boyd, London, 7th ed., 
1938, pp. 202-206. (Note that bold-face type is used to distinguish this function from z 
scores.) 

t Table 13:1 is adapted from Table VII of Fisher: Statistical Tables for Biological 
Agricultural and Medical Research, Oliver & Boyd, Ltd., Edinburgh, by permission of the 
Author and Publishers. 








TESTS OF SIGNIFICANCE FOR PRODUa-MOMENT 


387 


The standard error of z is estimated by the following formula: 

1 D3sl8] 

~ \/ iv — ^ Standard error of Pish- 

^ er’s z function 

where iV. is the size of the sample. The standard error of the z function is taken 
independent of the parameter value of r in this formula, which is an approxi¬ 
mation formula. 

To test the hypothesis that a sample r may have been obtained from a 
bi-variate universe with a correlation coeflScient of .87, we first transform 
this parameter value of r and the sample result for r (.67 for the height-weight 
correlation) to Fisher’s z fimction. According to Table 13:1, when r equals 
.87, z equals 1.33; when r equals .67, z equals .81. Hence the Test of Signifi¬ 
cance for this hypothesis is as follows: 


T = 



.81 - 1.33 
1 

Vl51 - 3 


-.52 

.08 


= -6.5 


Since the T ratio is considerably larger than a criterion of 2.5 or 3.0, we 
can with confidence reject the hypothesis that the correlation between the 
heights and weights of the 151 infants could have arisen as a random sample 
of a universe whose correlation is as large as .87. 

Although any number of hypotheses for values of r can be tested with 
Fisher’s z transformation function, we shall proceed to determine confidence 
limits and thereby find the limiting r values of the hypotheses which are 
likely (or unlikely) for the sample correlation of the height-weight measures. 


Confidence Limits for the Reliability of the Sample r 

We have seen that confidence limits for a statistic give a measure of its 
reliability. In this case, the sample r is .67. When r equals .67, z equals .81. 
Since the standard error of z is independent of the parameter value of r, ax 
is .08, as computed above. The confidence limits in terms of z are therefore 
as follows: 

z 2.5(<r,) = .81 =b 2.5(.08) = .61 and 1.01 

In order for these confidence limits to be meaningful, they must be con¬ 
verted back to their respective r values. As indicated in Table 13:1, when z 
equals .61, r equals .54; and when z equals 1.01, r equals .76. Consequently, 
the confidence limits for r in terms of a T criterion of 2.5 are equal to corre¬ 
lation coefficients of .54 and .76. We can therefore be confident that the sample 
result, .67, was derived from a universe whose product-moment correlation 
coefficient was at least .54 but not greater than .76. Hence these values of r 
are the estimated limits of reliability for the sample result. Since a coefficient 
between .54 and .76 has some predictive value, we can be confident that the 



388 


HYPOTHESES AND TESTS OF SIGNIFICANCE 


height of infants can be estimated from their weight, or that their weight 
can be estimated from their height, with a fair degree of accuracy on the 
average. 

Tests of Significance for Other Correlation Coefficients 

Estimates of the standard errors for correlation coefficients other than 
product-moment r are presented here for reference. Their use is analogous 
to that of product-moment r in testing the null hypothesis, i.e., that the 
parameter correlation is zero. Although they do not provide very satisfactory 
estimates of confidence limits, they are satisfactory for determining whether 
a sample result can be interpreted as significantly greater than zero, i.e., 
whether there is any correlation between two attributes that cannot be 
explained on the basis of chance. 

Standard Error of Spearman^s Rank-Difference Correlation Coefficient (Rho ): 

The following formula is satisfactory for testing the null hypothesis that 
a parameter rho is equal to zero: 

^ _ (1 -Jh^) [13:19] 

^ VStandard error of rho 

When testing the null hypothesis, the numerator is equal to 1, and this formula 
therefore becomes: 

1 [13:19a] 

~ \/W‘ Standard error of rho, 

for the null hypothesis 

This estimate of the standard error of rho is the same as that for the stand¬ 
ard error of r and likewise is unsatisfactory for the determination of con¬ 
fidence limits. A better procedur^ is to treat rho as r and then use Fisher’s z 
transformation function. Tables which are sometimes used for converting 
rho to r are a mathematical over-refinement because at no point is the correc¬ 
tion greater than 0.018. This value is usually considerably less than the 
variation in rho or r that can be expected to result from chance errors in 
sampling and measurement. 


Standard Error of a Biserial Correlation Coefficient 

The standard error of biserial r is estimated by the following formula when 
both p and q are greater than .05: * 





_y _ 


[13:20] 
Standard error of bise¬ 
rial r 


No satisfactory formulas are available for use when p or g is less than .05. 



TESTS OF SIGNIFICANCE FOR OTHER CORREUTION COEFFICIENTS 389 


where p is the proportion of the total group which is in the higher part of the 
dichotomized variable; q is equal to 1.0 — p; y is the value of the ordinate 
for the normal curve at a point which divides the distribution into two parts, 
with the proportion of the area above the point equal to p (see Table I, 
Appendix B); and is the size of the sample. The computation of is 
facilitated by Table VII, Appendix B. 

In testing the null hypothesis for a biserial correlation coeflScient, the 
r term in the numerator becomes zero, and hence the preceding formula be¬ 
comes: 

[13:20a] 

y Standard error of bise- 

<Tr^ = rial r, for the null hy- 

V V, yviVt pothesis 


The Standard Error of a Tetrachoric Correlation Coefficient 


The formula for the best estimate of the standard error of tetrachoric r is 


rather complex, but the following formula is satisfactory for testing the null 
hypothesis: 

[13:21] 


VppW 

y/VN, 


Standard error of r<, 
the tetrachoric coeffi¬ 
cient for the null hy¬ 
pothesis 


where p is the proportion of occurrences of a given class for the first variable; 
p' is the proportion of occurrences of the given class for the second variable; 
q is equal to 1.0 — p; q' is equal to 1.0 — p'; y is the ordinate value of the 
normal distribution for the point of division between p and q; and y' is the 
ordinate value for the second variable (see Table I, Appendix B). 


Standard Error of the Coefficient of Association 

The standard error of the Coefficient of Association in estimating the correla¬ 
tion between two sets of dichotomized non-variable attributes is given by the 
following: [13:22] 


1 - A2 


/l 1 1 1 
\a bed 


Standard error of the 
Coefficient of Associa¬ 
tion 


where a, 6 , c, and d signify the number of frequencies in the respective cells 
of the two-by-two cross-tabulation of the two attributes correlated. 

In testing the null hypothesis, i.e., that Ah equals zero, the above formula 
simplifies to one-half the square root of the product of the reciprocals of the 
frequencies of each of the four cells: 

1111 

« ’ * /* Standard error of A for 

tTA = -- the null hypothesis 




390 


HYPOTHESES AND TESTS OF SIGNIFICANCE 


Tests of Significance for the Skewness and Kurtosis of Distributions 

Two important properties of uni-modal distributions are ( 1 ) skewness and 
(2) kurtosis. We saw earlier that the skewness of the normal bell-shaped dis¬ 
tribution is zero; i.e., the distribution is bilaterally symmetrical above and 
below the mean. Furthermore, the area of a normal distribution is distributed 
between successive intervals above and below the mean in the proportions 
given in Table I, Appendix B. If the results from a sample distribution are 
not exactly normal but can be treated as if they were derived from a normally 
distributed universe, most statistics from the sample can be assumed to 
have normal sampling distributions. Or z scores can be developed for such 
distributions and be interpreted on the basis of the normal curve. 

For these reasons it is often relevant to ascertain whether or not a sample 
distribution diverges significantly from the normal, probability type. The 
most exact Test of Significance for this purpose is made in terms of chi- 
square (see Chapter 15, Section A). However, rough tests can be made for the 
two properties of skewness and kurtosis. The tests presented are based on 
the centile method and were developed by T. L. Kelley.* 


Test of Significance for Skewness 


The skewness of a distribution can be measured in terms of the relationship 
of the range to the median. If the median is exactly midway between the 
limits of a distribution, it is more likely to be bilaterally symmetrical than 
when it is closer to one limit than the other. The limits of a sample distribu¬ 
tion, however, are in themselves relatively unstable; consequently the median 
and the D range, i.e., Cio to Cgo, can be compared since the latter represents 
(approximately) the most stable limits for measuring a broad range of a sample 
distribution. 

The Centile Measure of Skewness. A measure of skewness therefore can 
be formulated as follows: 


Sk = - ntdn 

a 


[13:23] 

Centile measure of 
skewness 


This formula will give zero for skewness if the median is exactly midway 
between Cio and C 90 . If the skewness is negative, the tail of the distribution is 
extended toward the lower end of the scale. If the skewness is positive, the 
tail is extended toward the upper end of the scale. 

The value of a measure of skewness is relative to the range of scores as 
well as to their size. Hence, in order to evaluate the significance of a given 
measure of skewness for a sample result, we need to test a relevant hypothesis. 
The most relevant hypothesis is that Sk is zero, since this will be its value if 
it is derived from a normal distribution. If this hypothesis cannot be rejected 
with confidence, we can conclude that the skewness in a sample result can 


* T. L. Kelley, Statistieal Method, Macmillan, New York, 1924, p. 77. 



TESTS OF SIGNIFICANCE FOR SOME COMMONLY USED STATISTICS 391 


be attributed to chance errors of sampling—we can consider that the sample 
was drawn from a bilaterally symmetrical universe. Such a conclusion signifies 
not that the sample result was necessarily drawn from a normally distributed 
universe, but, rather, that it could have been. On the other hand, if the 
hypothesis that Sk is zero can be rejected with confidence, it is unlikely that 
the sample result was drawn from a normally distributed universe. 

The Standard Error of Sk. The standard error of Sk, for the hypothesis 
that Sk is zero, is computed as follows: 


.5185(C9o - Cxo) 
^ .5185D 


[13:24] 
Standard error of 
skewness 


Test of Significance for the Skewness of Test Scores. Table 13:2 shows the 
distribution of the scores of 250 college students on a true-false information 


Table 13:2. Distribution of the Test Scores of 250 College Students 


Test Scores 

f 

c . f . 

100-101 

11 

250 

98 - 99 

18 

239 

96 - 97 

33 

221 

94 - 95 

20 

188 

92 - 93 

25 

168 

90 - 91 

43 

143 

88 - 89 

36 

100 

86 - 87 

22 

64 

84 - 85 

11 

42 

82 - 83 

10 

31 

80 - 81 

14 

21 

78 - 79 

2 

7 

76 - 77 

4 

5 

74 - 75 

1 

1 


Cw ^ (250) = 225. Cm = 97.5 + ^ (2) = 97.94 

C,« 1(250) = 187.5. Ct6 = 93.5 + ’-11^(2) = 95.45 

Cw i(250) = 125. Cm “ 89.5 +1|(2) = 90.66 
2 43 

C,6t j(250) = 62.5. C,s = 85.5 +^(2) = 87.36 
Cro. ^(2S0) = 25. Cm = 81.5 + ^ (2) - 82.30 








392 


HYPOTHESES AND TESTS OF SIGNIHCANCE 


test on a course in general psychology. The frequencies are cumulated in the 
right-hand column and the centile values required for measuring the skewness 
of fliia distribution are given at the bottom of the table. <Sft is as follows: 


Sfe . ^ 52 ^ _ ^ ^ 82,30 + 9 ^ _ 


90.12 - 90.66 = - .54 


The distribution is therefore negatively skewed. But is the skewness greater 
than would be expected on the basis of chance for random samples drawn 
from a bilaterally symmetrical normal distribution? The Test of Significance 
that will answer this question is as follows: 

^ Sk. - SkK - .54 - 0 ^ ^ 

(TSk .51 


In this equation the value of ask, .51, is computed as follows: 


(^Sk 


.5185(Z)) ^ .5185(97.94 - 82,30) 

Vn] ~ 


8.11 

15.81 


= .51 


Since T is only 1.1, we cannot reject with confidence the hypothesis that Sk 
is zero; hence the skewness of the sample result is not significantly greater 
than zero. In other words, so far as skewness is concerned, this sample result 
could have been drawn from a normally distributed universe. 


Test of Significance for Kurtosis 


A Centile Measure of Kurtosis In a normal distribution, the quartile 
deviation (Q.D,) is approximately one-fourth as large as the D range. This 
relationship was taken by Kelley as the basis for the following measure of 


kurtosis: 


„ (C 75 - C2»)/2 <?.D. 

C 90 - Cio D 


[13:25] 

Centile measure of 
kurtosis 


In a normal distribution the ratio of Q,D. to D is computed as .2632, rather 
than .25. Earlier (page 348) we saw that a normal distribution is characterized 
as mesokurtk; a flattened distribution as platykurtic; and a peaked distribution 
as leptokurtic. In terms of the preceding centile measure of kurtosis, a dis¬ 
tribution is mesokurtic when Ku = .2632; it is platykurtic if Ku > .2632; 
and it is leptokurtic if Ku < .2632. 

The Standard Error of Ku. The standard error of the centile measure of 
kurtosis is given by the following: 

_ .27779 [13126] 

~ xZ/vT" Standard error of kur- 

tosis 



THE PROBABLE ERROR AND TESTS OF SIGNIFICANCE 393 


As indicated at the bottom of Table 13:2, C 75 is 95.45 and C 25 is 87.36. The 
values of Cio and C 90 have already been given. Hence, 


And 


Ku = 


(C75 ~ C^ 25)/2 
C90 — Cio 


4.045 

15.64 


.2586 


(95.45 - 87.36)/2 
97.94 - 82.30 


.27779 .27779 

otku = — 7 = = = .0176 


Vn7 V250 


The distribution is therefore not mesokurtic but leptokurtic. But is this 
divergence from mesokurtosis significantly greater than would be expected 
on the basis of chance errors in sampling and measurement? The following 
Test of Significance will answer this question: 


T = 


Ku» — Kuh 

<TKu 


.0046 

.0176 


= 0.3 


.2586 - .2632 
.0176 


where .2632 is the kurtosis of the hypothesis to be tested, i.e., mesokurtosis 
characteristic of a normal distribution. Since T is only 0.3, we cannot reject 
tliis hypothesis. Hence, we conclude that the leptokurtosis of the distribution 
in Table 13:2 can be attributable to chance errors of sampling and measure¬ 
ment; in other words, the sample could have been drawn randomly from a 
universe that is normally distributed as far as mesokurtosis is concerned. 


E. THE PROBABLE ERROR AND TESTS OF SIGNIFICANCE 

Until recent years, Tests of Significance and confidence limits were taken 
in terms of the probable error (P.E.) of a statistic more often than in terms 
of the standard error, and some statisticians still do so. However, no Tests 
of Significance have been developed here in terms of P.E. for the preceding 
statistics because the probable error is derived from the standard error and 
therefore an unnecessary computation is involved in the result. In all sampling 
distributions that can be assumed to have the form of the standard, normal 
probability curve, the P.E. of any statistic is given by the following: 

[13:27] 

Probable error of any 
P.E. a = .6745(rj statistic whose sam¬ 

pling distribution is 
normal 

where the subscript s represents any statistic. P.E. is a measure of variability 
for a normal sampling distribution that marks off the range of 25% of the 
sample results above or below the mean. This range is given by .6745a’ (cf. 
Table I, Appendix B). 



394 


HYPOTHESES AND TESTS OF SiGNIHCANCE 


The P.E.’s for the statistics in this chapter of the preceding sections are 
therefore equal to .6745<r, and are given below for reference. 

E» ^^ite /1 AA-./../)vr \ £.^ je-./.... / at ^13.28] 


P.E.r. = . 6745 ( 100 Vpg/Ar,) = 67ASVpq/N. 
P.E., = .614Sy/pq/Jf. 

P.E., = .6745V7^ 

<r .6745<r 

P.E.M = .6745 


P.E.^ = 


P.E.C. = 


P.E.Q = 


.6745 -4= = 

vw. 

.6745(7 

y/W. 

.6745<r.vT^ 

rx»' 

.6745(.707<r) 

. .477(7 _ 

VnI 

’ y/W. 

.6745(.603ff) 

_ .407(7 

vw. 

’ y/W, 

.6745 

yyN. 


.6745(1.483(?) 

Jb-9. 

y 

y 

.6745(1.253(7) 

_ .845(7 

VW, 

" y/W. 

.6745(1.858(?) 

1.2530 

y/W. 

y/W, 

.6745(.787<r) 

_ .531(7 

viv. 

y/W. 

.6745(1.1670) 

_ .7870 

y/W. 

y/W, 

.6745(.648<r) 

_ .437(7 

y/W. 

~ y/W. 

.6745(.9610) 

y/W, 

_ .6480 
y/W. 


P,E, of a percentage 

[13:29] 

P.E, of a proportion 

[13 : 30] 

P.E. of a frequency 

[13:31] 
P.E. of arithmetic 
mean 

[13:32] 

P,E. of a measure 

[13:33] 
P.E, of the standard 
deviation 

[13:34] 
P.E. of the average de¬ 
viation 

[13:35] 

P,E. of any centile in 
terms of <r 

[13:35a] 

P,E. of any centile in 
terms of Q 

[13:36] 
P,E. of the median in 
terms of <7 

[13:36a] 

P.E, of the median in 
terms of Q 

[13: 37] 

P,E. of the quartile de¬ 
viation in terms of (T 

[13:37a] 
P.E. of the quartile de¬ 
viation in terms of Q 

[13:38] 
P,E. of the tercile de¬ 
viation in terms of cr 

[13:38a] 
P,E, of the tercile de¬ 
viation in terms of Q 



THE PROBABLE ERROR AND TESTS OF SIGNIFICANCE 


395 


P.E,t.d. 


.6745(1.50ir.D.) ^ 1.012r,£>. 

Viv. vjv. 


D3:38b] 
P.E. of the tercile de¬ 
viation in terms of T,D, 


P.E.V = 

P,E.d = 

P.E.d = 


.6745(2.279<r) _ 1.537<r 


Vn. 

Vn. 

.6745(3.380Q) 

2.2800 

Vn. 

Vn. 

.6745(.889D) 

_ .600D 

VJv. 

" VW. 


[13:39] 
P.E. of the D range 
(Cm to Cio) in terms of (T 

[13:39a] 
P.E. of the D range in 
terms of Q 


[13:39b] 
P.E, of the D range in 
terms of D 


P,E.r 


.6745(1 - 

vw. 


[13:40] 
P.E. of product-mo¬ 
ment r 


P,E,r 


.6745 

Vn, 


[13:40a] 
P.E.r for the null hy¬ 
pothesis 


P.E,^ 


.6745 
Vn. - 3 


[13:41] 
P.E. of Fisher’s z func¬ 
tion (approximate) 


.6745(1 - p2) 

vw. 


[13:42] 
P.E.p of rank-differ¬ 
ence correlation, rho 


P.E., 


.6745 

Vn'. 


[13:42a] 
P.E.p for the null hy¬ 
pothesis 


P.E.r^ = 


.6745 - rw* 

y_ 

Vn. 


[13:43] 
P.E. of hiserial r 


„ „ .6745Vpg 


P.E.r, = 


.6745V pp'qq' 

yyVW. 

.6745 


P.E.a=^- 


1111 

abed 


= .3373 


/ill 1 

\abcd 


[13:43a] 
P.E.r^ for the null hy¬ 
pothesis 

[13:44] 
P.E.rt for the null hy¬ 
pothesis 


[13:45] 
P,E.a for the null hy¬ 
pothesis 




396 


HYPOTHESES AND TESTS OF SIGNIFICANCE 


The Probability Implications of P.E. 

Since the P.E, of a statistic whose sampling distribution is normally dis¬ 
tributed is only about two-thirds (.6745) as large as the standard error, the 
probability values are somewhat different. The logic of any Test of Signifi¬ 
cance in terms of P.E, is the same, however, as of one in terms of <r. 

Fig. 13:4. Relation of the Probable Error (P.E.) to the Standard Error (o’) for the 
Normal Probability Distribution 



The probability implications of P,E, are shown in Fig. 13:4, Since the 
range of 3.0 standard errors is equal to about 4.6 P.E.’s, the practical limits 
of a normal sampling distribution are often taken as either ±4 or 5 P.E. units. 

Percentage confidence levels in terms of P.E. are usually rounded as follows, 
when the probabilities include both tails of the distribution: 

5% level = ±3.0 P.E. 

P = .043 

1% level = ± 4.0 P.E. 

P = .007 

0.1% level = ±5.0 P.E. 

P = .0008 

If the probabilities of only one tail are concerned, the 1% and 0.1% con¬ 
fidence levels will still be approximately 4.0 and 5.0 P.E., respectively. The 
2% confidence level will be 3 P.E. 

T ratios in terms of P.E. may also be used. Thus a T ratio of less than 
3.0 P.E. units provides a satisfactory criterion for likely hypotheses. 





TESTS OF SIGNIFICANCE FOR SMALL SAMPLES 


397 


T < 3.0 P,E. = likely hypotheses 
T > 3.0 < 4.00 P.E, = tentative or unlikely hypotheses 
T > 4.0 P,E. = unlikely hypotheses 

The P values for ±5 units of P.E. are summarized in Table 13:3. However, 
the P value of any fraction of P,E. can be obtained from Table I, Appendix B, 
by converting the P.E. value into a?/<r; this is done by multiplying the P.E. 
value by .6745. Thus, 2.2 P.E. units equal 2.2(.6745) standard error units, or 
1.48<r. 


Table 13:3. Probability Implications in Normal Sampling Distributions 
of 1 to 5 Probable Error Units 

(Total Area = 1.0) 


Rangft 

Probability 

Range 

Probability 

Within A4 -h or - 1 P.E. 

P = .250 

Beyond 1 P.£. 

P = .250 

“ M -t- or - 2 P.£. 

P = .411 

“ 2 P.£. 

P = .089 

•• M + or - 3 P.£. 

P = .478 

“ 3 P.£. 

P = .022 

•• Ai -f or - 4 P.£. 

P = .496 

“ 4 P.£. 

P = .004 

“ M -h or - 5 P.£. 

P = .4996 

“ 5 P.£. 

P = .0004 

Within M + and - 1 P.£. 

P = .500 

Beyond ± 1 P.E. 

P = .500 

•• M -f and - 2 P.£. 

P = .823 

“ +2P.E. 

P = .177 

•• M -f and - 3 P.E. 

P = .957 

“ ±3P.£. 

P = .043 

“ M + and - 4 P.£. 

P = .993 

" +4 P.E. 

P - .007 

•• M 4* and - 5 P.£. 

P = .9992 

** ±5P.£. 

P = .0008 


F. TESTS OF SIGNIFICANCE FOR SMALL SAMPLES 
Fisher’s f Statistic * 

We pointed out earlier that Tests of Significance for large sample and 
small sample theory are based upon the same logic and that their form is the 
same in both cases, i.e. : 


j, 

Test of Significance 
(large sample theory) 

O', 

Test of Significance 
(smfidl sample theory) 


When there are less than 25 or 30 cases in a sample, the probability implica¬ 
tions of small rather than large sample theory are utilized. The essence of the 
distinction between T and t is the fact that the forms and variability of the 
sampling distributions are different. Consequently, the probability values for 
a given t ratio will be different from those of a T ratio. (Cf. the probability 
values of small sample and large sample theory in Table 12:3.) 

The t ratio of small sample theory is especially valuable in agricultural 
economics and biometrics (the fields of research in which Fisher’s work has 

* R. A. Fisher, Statistical Methods for Research Workers, pp. 126 ff. 



398 HYPOTHESES AND TESTS OF SIGNIFICANCE 

centered) since often only a few cases are necessary for an adequate result In 
general, this is not true in psychology and the social sciences, although there 
are some notable exceptions, as when the sampling unit itself represents a 
coUjBctive entity. Thus 10 school systems may be an adequate sample for 
research on school budgets. 

The following section on the t ratio for small samples has been included for 
convenience of reference. 


Table 13:4. Distribution of t for Tests of Significance of Small Samples * 


(N.-1) 

.5 

.1 

Probability: P 

.05 .02 

.01 

.001 

1 

1.000 

6.314 

12.706 

31.821 

63.657 

636.619 

2 

.816 

2.920 

4.303 

6.965 

9.925 

31.598 

3 

.765 

2.353 

3.182 

4.541 

5.841 

12.941 

4 

.741 

2.132 

2.776 

3.747 

4.604 

8.610 

5 

.727 

2.015 

2.571 

3.365 

4.032 

6.859 

6 

i .718 

1.943 

2.447 

3.143 

3.707 

5.959 

7 

.711 

1.895 

2.365 

2.998 

3.499 

5.405 

8 

.706 

1.860 

2.306 

2.896 

3.355 

5.041 

9 

.703 

1.833 

2.262 

2.821 

3.250 

4.781 

10 

.700 

1.812 

2.228 

2.764 

3.169 

4.587 

11 

.697 

1.796 

2.201 

2.718 

3.106 

4.437 

12 

.695 

1.782 

2.179 

2.681 

3.055 

4.318 

13 

.694 

1.771 

2.160 

2.650 

3.012 

4.221 

14 

.692 

1.761 

2.145 

2.624 

2.977 

4.140 

15 

.691 

1.753 

2.131 

2.602 

2.947 

4.073 

16 

.690 

1.746 

2.120 

2.583 

2.921 

4.015 

17 

.689 

1.740 

2.110 

2.567 

2.898 

3.965 

18 

.688 

1.734 

2,101 

2.552 

2.878 

3.922 

19 

.688 

1.729 

2.093 

2.539 

2.861 

3.883 

20 

.687 

1.725 

2.086 

2.528 

2.845 

3.850 

21 

.686 

1.721 

2.080 

2.518 

2.831 

3.819 

22 

.686 

1.717 

2.074 

2.508 

2.819 

3.792 

23 

.685 

1.714 

2.069 

2.500 

2.807 

3.767 

24 

.685 

1.711 

2.064 

2.492 

2.797 

3.745 

26 

.684 

1.708 

2.060 

2.485 

2.787 

3.725 

26 

.684 

1.706 

2.056 

2.479 

2.779 

3.707 

27 

.684 

1.703 

2.052 

2.473 

2.771 

3.690 

28 

.683 

1.701 

2.048 

2.467 

2.763 

3.674 

29 

.683 

1.699 

2.045 

2.462 

2.756 

3.659 

30 

.683 

1.697 

2.042 

2.457 

2.750 

3.646 

40 

.681 

1.684 

2.021 

2.423 

2.704 

3.551 

60 

.679 

1.671 

2.000 

2.390 

2.660 

3.460 

120 

.677 

1.658 

1.980 

2.358 

2.617 

3.373 

00 

.674 

1.645 

1.960 

2.326 

2.576 

3.291 


* Table 13:4 is abridged from Table HI of Fisher: SkUistical Tables for Biological, Agri- 
cultural and Medical Research, Oliver & Boyd, Ltd., Edinburgh, by permission of the Author 
and Publishers. 






TESTS OF SIGNIFICANCE FOR SAAALL SAMPLES 


399 


Probability Values for t 

Since the logic and procedure for Tests of Significance with small and large 
samples are similar, the only difference that remains to be considered con¬ 
cerns the determination of the probability of a t result. Such estimates are 
to be obtained from Table 13:4, the R. A. Fisher table of probability values 
for t of small samples. The t values are given in the body of Table 13:4. 
P values are given at the top of each column, and the size of the sample is 
given in terms of — 1 in the left-hand column.* 

The P values in the table include the probabilities for both tails of the 
sampling distribution. Thus, when iV, is 15 and when t is 3, the iV« — 1 row, 
i.e., 14, shows that the probabilities are about 1 in 100 that the result might 
diverge 3.0(r or more above or below the parameter mean of the sampling 
distribution. 

Table 13:4 can be used with Tests of Significance for the various statistics 
discussed in the preceding sections, and it should be used when iV, is less 
than 25 or 30. 


EXERCISES 

1. In what way does a hypothesis give meaning and direction to a research investi¬ 
gation? 

2. For any Test of Significance, distinguish between making a probability estimate 
and evaluating the likelihood or unlikelihood of a result. 

3. What is the meaning of a confidence criterion? Distinguish between the different 
kinds of confidence criteria that are employed in statistical work, and indicate the 
relation between percentage confidence levels and T ratio criteria. 

4. Give an example of a Test of Significance in which the probability of only one 
tail of the sampling distribution must be considered in evaluating the significance 
of the T ratio. 

5. Give an example of a Test of Significance in which the probabilities of both tails 
of the sampling distribution are relevant to an evaluation of the significance of the 
T ratio. 

6. What is the difference between confidence criteria and confidence limits? For 
what particular purpose are the latter employed? 

7. What is the null hypothesis? Give several examples of statistical hypotheses that 
are of this form. 

Set up relevant Tests of Significance for the data in the following nine problems, 
determine the value of T for each test, and interpret your results: 

8. Of a random sample of 500 adults, 45% say they will vote for Mr. A; 35% say they 
will vote for Mr. B, and the remainder say they will vote for other candidates. 


* This table was originally developed in terms of degrees of freedom^ which can usually 
be treated as equal to iV« 1 for the Tests of Significance of the statistics considered in 
this chapter (cf. chap. 15). 



400 


HYPOTHESES AND TESTS OF SIGNIHCANCE 


9. Use the results of the Gallup Poll on a loan to England presented on page 317. 

a. For the national sample result, assume a total sample of 4000 cases. 

b. For OCCUPATIONS, assume that the sample consisted of 500 business and 
professional people, 1000 white-collar workers, 1000 fanners, and 1500 man¬ 
ual workers. 

c. For EDUCATION, Rssume that the college group consisted of 300 people; that 
the high-school group consisted of 2700, and that the grammar or no-school 
group consisted of 1000. 

d. For POLITICAL PARTY, assume that the Republican voters totaled 1900 and 
the Democratic voters totaled 2100. 

10. The mean percentage grade score received by a random sample of 75 college 
students is 80%; the standard deviation of the sample result is 7%. 

11. A subject is given 50 trials to judge whether pairs of weights are similar or differ¬ 
ent; the order of like and unlike pairs is randomized. The subject’s judgments are 
correct in 30 of the 50 trials. 

12. A subject is presented a series of 20 pairs of tones of the same pitch, each pair 
differing in intensity; he is to judge whether the second tone of each pair is louder 
or softer than the first. His judgment is correct in 11 cases out of the 20. 

13. A person receives a score of 124 on a test whose standard deviation is 25 test 
score units, whose mean is 110, and whose reliability coefficient is estimated to be 
. 88 . 

14. The product-moment correlation between the intelligence test scores and the 
heights of a group of 50 high-school students is .05. 

15. The product-moment correlation between the mechanical aptitude test scores and 
the ages of a group of 75 high-school seniors is —.11. 

16. With the variables in Table 6;7 (page 148), determine: 

a. the kurtosis of each distribution 

b. the skewness of each distribution 

Set up confidence limits and interpret the reliability of the statistics in the follow¬ 
ing four problems: 

17. The mean and standard deviation for the 1838 policemen and firemen on the 
Bennett Mechanical Comprehension Test (page 187) were 35.6 and 10.1 respec¬ 
tively. 

18. The median score made on an intelligence test by a random sample of 150 people 
is 83. The quartile deviation of the sample result is 13. 

19. Use either variable given in Table 6:7 (page 148) and determine: 

a. the tercile deviation 

b. the D range 

c. the 12th vigintile 

d. the quartile deviation 

e. the 4th quintile 

20. A product-moment correlation coefficient of .80 between an intelligence test and 
an achievement test is obtained from a random sample of 90 high-school seniors. 
Set up confidence limits both in terms of r derived from the standard error of r, 
and in terms of r derived from the standard error of Fisher’s z function. 



CHAPTER 14 


Tests of Significance for Differences 
Between Statistics 


Any Test of Significance for a difference between two statistics is a test of 
the hypothesis that the parameter difference between two statistics is equal to 
zero. We shall characterize all such hypotheses as null hypotheses.* The 
importance of Tests of Significance for differences should be apparent. If a 
difference between two statistics is not significantly greater than zero, we 
cannot infer that the difference obtained from the sample result is more than 
a chance difference. 

Research problems often require Tests of Significance for the hypothesis of 
“no difference.” For example, in studying psychological differences between 
two groups, a comparison of sample results may show a difference, but a Test 
of Significance may not indicate a difference significantly greater than zero. 
In the latter case the result is based on random samples of the two groups and 
if the measures of psychological characteristics are determined by satisfactory 
methods, we can be confident that no real differences exist in the characteris¬ 
tics compared. Again, in any controlled, experimental investigation, a Test 
of Significance for any sample difference is crucial in determining whether the 
result yielded by the experimental variable is actually any different from 
what would be expected on the basis of chance alone. This point is illus¬ 
trated in some of the examples in this chapter. 


A, THE STANDARD ERROR OF A DIFFERENCE 
BETWEEN ANY TWO STATISTICS 


The general formula for the standard error of a difference between any two 
statistics is [14*1] 

Standard error of a 
difference between two 
statistics 




*x*v^*x^*lf 


* Some authors, following R. A. Fisher, restrict the use of the concept null hypothesis to 
those situations in which the statistics under consideration are, by hypothesis, taken ao 
random samples from the same universe. In such cases the variance of the universe whose 
parameter is assumed to be zero is estimated from the combined results of the samples. 
In this chapter, however, we shall use the variance of each statistic, following the procedures 
of classical statistics. (Cf. C. C. Peters and W. R. Van Voorhis, Statistical Procedures and 
Their Mathematical Basest McGraw-Hill, New York, 1940, pp. 177 ff.) 

401 





402 TESTS OF SIGNIRCANCE FOR DIFFERENCES BETWEEN STATISTICS 


where Sg, represents any statistic (proportion, mean, median, standard devia¬ 
tion, correlation coefficient, etc.) derived from sample x, and Sy represents any 
other statistic derived from sample y which is compared with the first; <TsJ is 
the variance of the sampling distribution of the fimt statistic, and is the 
variance of the sampling distribution of the second statistic; represents 
the correlation between the two statistics compared, and and are the 
standard errors of each statistic. The standard error formulas for statistics 
whose sampling distributions can be assumed to be normal have already been 
given in the preceding chapter; they are used in Formula 14:1. The only addi¬ 
tional value required is the correlation coefficient, between the two 
statistics whose difference is being compared. 

Formula 14:1 is general and can be used in analyzing differences between 
statistics when the standard error of each statistic, as well as any correlation 
between them, can be satisfactorily estimated. Most Tests of Significance 
for a difference between two statistics are set up for two statistics of the 
same class. That is, differences between proportions, or between means or 
between correlation coefficients are usually analyzed rather than a difference 
between a proportion and a mean, for example. 

Standard Error of a Difference for Independent Samples 

Whenever a difference between two statistics is obtained from independent 
samples, the correlation between them is zero. Consequently, the third term 
of Formula 14:1 is equal to zero, and the general formula for the standard 
error of a difference simplifies to the following: 

[14:2] 

- - Standard error of a 

^H" ^»y^ difference between two 

statistics derived from 
non-correlated samples 

Thus the estimate of the standard error is the square root of the sum of the 
variances of the sampling distributions of each statistic. Even when the 
statistics compared are drawn from dependent samples, and hence there is 
likely to be some correlation between them, it may be unnecessary to com¬ 
pute the correlation coefficient for the third term of the general formula. If 
the standard error of the difference is based on the abbreviated formula and 
yields a T ratio equal to or greater than 2.5 or 3.0, the rejection of the null 
hypothesis is usually warranted, despite the possibility of a positive corre¬ 
lation between the two statistics. The third term will be negative if there is 
any positive correlation. A negative term will decrease the estimate and 
hence increase the value of the T ratio. If, however, there is a negative corre¬ 
lation between the two statistics, the standard error of the difference will be 
increased and the T ratio will be smaller. 

On the other hand, when a result derived from dependent samples yields a 
T ratio of less than 2.5, the use of the third term may make a real difference 



DIFFERENCES BETWEEN ANY TWO STATISTICS 403 

in the conclusion drawn from the Test of Significance. Hence, under such 
circumstances, the correlation coefficient should be computed. 

B. TESTS OF SIGNIFICANCE FOR A DIFFERENCE 
BETWEEN ANY TWO STATISTICS 

The logic of a Test of Significance for a difference between any two statistics 
is the same as that described in the preceding chapter for Tests of Significance 
of single statistics. Thus the Test of Significance yields a test ratio, T, as 
follows: 

j, _ sample difference — parameter difference of zero 
standard error of difference 

( 5 * — 5w) — Ofc D4*3] 

. T ratio for Test of 

* ^ Significance for a dif¬ 

ference between two 
statistics 

where {Sx — Sy) is the difference between the two statistics; zero is the pa¬ 
rameter value of the difference for the hypothesis tested; and is the 

standard error of the difference between the two statistics. 

This Test of Significance yields a T ratio similar in its implications to the T 
ratios already considered. Whenever there is justification for assuming that 
the sampling distribution of differences is normal, the distribution for T in 
Table II, Appendix B, can be used to obtain a probability estimate for inter¬ 
preting the result. When the samples are random and consist of 25 or more 
cases, the assumption regarding the sampling distribution of differences is 
usually warranted for the statistics considered in the preceding chapter. 

Confidence Criteria for the Significance of a Difference 

To determine whether the probability value of a test ratio warrants the 
rejection of the null hypothesis, we shall employ the same criteria as were used 
earlier. In other words, if the T ratio is equal to or greater than 2.5 (approxi¬ 
mately the 1% confidence level), we shall ordinarily conclude that the dif¬ 
ference for the sample result is unlikely on the basis of chance alone. Hence 
we can reject the null hypothesis and infer that the difference is attributable 
(at least in part) to extra-chance factors. 

On the other hand, if the T ratio is less than 2.0 the rejection of the null 
hypothesis is not warranted. Hence in such cases we shall conclude that the 
difference is likely on the basis of chance alone and is therefore “insignificant.” 
As was emphasized regarding a Test of Significance for a correlation coeffi¬ 
cient (Chapter 13), a difference may not be statistically “significant,” but it 
may have real significance from the psychological or research point of view. 

If the T ratio is equal to or greater than 2.0 but less than 2.5, we shall 
ordinarily conclude that the difference is of doubtful significance. Such a 



404 TESTS OF SIGNIFICANCE FOR OIFURENCES BETWEEN STATISTICS 


ratio warrants only a tentative or doubtful inference with respect to the 
rejection or non-rejection of the null hypothesis. 

C TESTS OF SIGNIFICANCE FOR A DIFFERENCE BETWEEN 
PERCENTAGES (OR PROPORTIONS) DERIVED FROM 
NON-CORREUTED SAMPLES 

The standard error of a difference between percentage or proportion 
statistics is obtained by substituting the standard error of each statistic in 
Formula 14:1. If the statistics are derived from independent (non-correlated) 
samples, the third term in the formula is zero and the standard error of the 
difference becomes a special case of Formula 14:2. Thus, for proportions: 

D4:4] 

_ Ip g p„ Standard error of a 

difference between pro- 
> fv* iMy portions derived from 
non-correlated samples 

used) the standard error 

[ 14 . 5 : 
Standard error of a 
difference between per¬ 
centages derived from 
.non-correlated samples 

A Test of Significance for the difference between two percentages is as 
foUows: 

[14:6] 

rp _ (%x — %») — 0 T ratio for a Test of 

■“ Z Significance of the dif- 

(%x-%y> ference between two 

percentages 

where the parameter difference of the hypothesis tested is taken as zero. The 
usefulness of this Test of Significance will be illustrated in connection with the 
evaluation of differences among groups in an opinion poll. 

Elmo Roper, director of the Fortum magazine polls, in 1945 conducted an 
opinion poll * of a national stratified sample of adults on the question: 

WE WANT TO KNOW HOW THE PUBLIC RATES PRESIDENT TRUMAN ON 
SEVERAL SPECIFIC THINGS, FROM WHAT THEY HAVE SEEN OF HIM 
UP TO NOW. SO FAR AS HIS HANDLING OF OUR RELATIONS WITH 
FOREIGN COUNTRIES, CONGRESS, HOME PROBLEMS GOES, WOULD 
YOU SAY PRESIDENT TRUMAN IS DOING AN EXCELLENT, GOOD, ONLY 
FAIR OR POOR JOB? 


If percentage statistics rather than proportions are 
obtained with Formula 14:4 is multiplied by 100: 

/r — 1 on Ef^ M SaEa 


* New York Herald Tribune, October 13,1945. 



DIFFERENCES BETWEEN PERCENTAGES 


405 


The opinions of the respondents who supported Roosevelt and Dewey in 
1944 were analyzed for each aspect of the question. For home problems, the 
breakdown was as follows: 



Candidate Voted for or 

Favored In 

Prasidant Truman's Handling of 

(R) 

(D) 

Home Problems 

Roosevelt 

Dewey 

Excellent 

13.6% 

11.3% 

Good 

48.8 

44.8 

Only fair 

19.3 

26.1 

Poor 

3.4 

5.0 

Don't know 

14.9 

12.8 


100.0% 

100.0% 


These two groups of results represent independent sub-samples of the total 
sample and are therefore uncorrelated. Let us assume that each sub-sample 
consists of 1000 cases, so that Nr = 1000 and Nd = 1000. (Roper did not 
report the actual size of his sample.) For convenience the Roosevelt and 
Dewey groups have been taken as equal, although in a stratified-random 
sample a somewhat greater proportion of Roosevelt supporters would be 
expected in view of the actual 1944 election returns. The results when com¬ 
bined are as follows: 


President Truman's Handling of 
Home Problems 

Excellent or good 
Only fair or poor 

Don't know 
Total (N assumed) 


Roosevelt 

Dewey 

Supporters 

Supporters 

624 

561 

227 

311 

851 

872 

149 

128 

1000 

1000 


We shall consider only the results for the respondents who had an opinion 
(eliminating the D.K,'s), The percentages, based on those who held an 
opinion, now become: 


Good or axcellont (624/851) = 73.3% 

Only fair or poor (227/851) = 26.7% 

100 . 0 % 


(561/872) = 64.3% 
(311/872) = 35.7% 
100 . 0 % 


Is the difference between the opinions of the R and D samples significantly 
greater than zero? The Test of Significance used here is as follows (the com¬ 
parison being made in terms of the percentage of favorable opinions; the result 
would be the same, however, if the percentage of unfavorable opinions were 
used): 

rp _ (%R - %d) “0 _ (73.3% - 64.3%) - 0 _ 9^ _ ^ ^ 

100 /(•733)(.267) (.643)(.357) 2.2% 

V 851 872 


Since the T ratio is greater than 2.5 or 3.0, we can reject with confidence the 
null hypothesis that the difference is zero. In other words, the difference is 



406 TESTS OF SIGNIFICANCE FOR DIFFERENCES BETWEEN STATISTICS 


too great to be likely on the basis of chance for the hypothesis of no 
difference. 

It follows, then, that at the time of this poll more Roosevelt supporters than 
Dewey supporters thought that Truman was doing a good or excellent job in 
handling home problems. (However, a considerable majority of both groups 
held this opinion.) 

Fig. 14:1. Sampling Distribution of a Difference Between Two Percentages for the 
Hypothesis That the Difference Is Zero. (Roper Data) 



-6.6% -4.4% -2.2% 0 2.2% 4.4% 6.6% 8.8% 

-^Sample Differences Favoring Dewey group | Sample Differences Favoring Roosevelt group —►- 


Let us now consider the statistical implications of this result in relation to 
the sampling distribution of differences whose standard error was found to 
be 2.2%. We assume that the hypothetical sampling distribution is normal, 
as shown in Fig. 14:1. The mean of this distribution is the parameter per¬ 
centage for the difference of the hypothesis tested, or zero. All sample differ¬ 
ences above this mean indicate that more Roosevelt supporters had favorable 
opinions; all sample differences below this mean indicate that more Dewey 
supporters had favorable opinions. The statistical problem is to determine 
whether the difference obtained from a single sample result diverges so far 
above the parameter percentage of zero as to indicate that a zero difference 
or a difference in the other direction is unlikely. 

As indicated in Fig. 14:1, when the T ratio is 4.1, the difference is 4.1 times 
the standard error. The probabilities are only .00003 (3 in 100,000) of obtain¬ 
ing on the basis of chance a sample difference as large as the one obtained, 
viz., 9.0%. 




PERCENTAGE DIFFERENCES FOR CORREUTED SAMPLES 


407 


D. TESTS OF SIGNIFICANCE FOR A DIFFERENCE BETWEEN 
PERCENTAGES DERIVED FROM CORRELATED SAMPLES 

The preceding Test of Significance involved two sub-samples which were 
uncorrelated. If, however, two sub-samples are matched by individual pairs, 
as in the experimental method of matched groups, the standard error of the 
difference will undoubtedly be somewhat reduced because the technique of 
matching pairs usually restricts the fluctuations (or variability) in sampling. 
The correlation between the two statistics in the third term of Formula 14:1 
takes this restriction into account. 

For percentages. Formula 14:1 becomes: 

- [14:7] 

~ ^^PxPy^Px^Py Standard error of the 

difference between two 
Px(lx , Py^y __ 2 j PxQxPygy percentages derived 

' Ny 'Y from matched or cor¬ 

related samples 

with the correlation between two proportions, equal to Txy, the correla¬ 
tion between the paired results or variates of the matched samples. Very 
often Txy is not obtainable for proportions or percentages. If the standard 
error estimated with Formula 14:5 shows that the difference is significant, 
there is ordinarily no need to be concerned with the r term in Formula 14:7. 
But if Formula 14:5 gives a T ratio between 2.0 and 2.5, or less than 2.0, 
Formula 14:7 may yield a T ratio greater than 2.5, and thus warrant the 
inference that the difference is significant. 

This type of problem with matched samples is illustrated by the following 
example, in which an analysis is made of the effect, on the attitudes of lis¬ 
teners, of introducing a commercial announcement in the middle of a radio 
program.* 

Two groups of 100 listeners each were matched pair by pair with respect to 
age, sex, education, and general attitude toward the type of program to be 
used in the experiment. The “control” group, C, heard the program without 
the commercial. The experimental group, E, heard the program with the 
commercial. The over-all attitudes of Groups C and E toward the program 
are as follows: 

Group C: Like = 60%; Dislike = 40% 

Group E: Like = 50%; Dislike = 50% 

Thus, three-fifths of the control group liked the program when it did not in¬ 
clude the commercial, whereas the experimental group, which heard the com- 


* The methodological technique of the Program Analyzer developed by Drs. Paul 
Lazarsfeld and Frank Stanton is employed in such analyses. Cf. J. G. Peatman and 
T. Hallonquist, The Patterning of Listeners' Attitudes Toward Radio Broadcasts: Methods and 
Results, Stanford Univ. Press, Stanford University, 1945, especially chap. 1. 




408 TESTS OF SIGNIFICANCE FOR DIFFERENCES BETWEEN STATISTICS 


mercial, was evenly divided between “like” and “dislike.” Is the difference 
of 10 percentage points in the “likes” of the two groups significantly greater 
than zero? The Test of Significance which must be set up in order to answer 
this question is as follows: 

^ ^ (60% - 50%) - 0 

If the hypothesis that the difference is zero can be rejected, then, in view of 
the matching controls introduced, the more favorable attitudes of Group C 
should be attributable, at least in part, to the absence of the middle com¬ 
mercial from the program they heard. 

The T ratio obtained in terms of the standard error of the difference, the 
latter estimated by Formula 14:5, is as follows: 

y _ (60% - 50%) - 0 10% ^ 10% ^ ^ ^ 

mn /(•60)(.40) (.50)(.50) IOOV.0049 T% 

\ 100 100 


Since the T ratio is 1.4, the null hypothesis cannot be rejected. 

A more accurate estimate of the standard error of the difference can be 
obtained with the complete formula (14:7), which takes into account any 
correlation between the attitudes of these matched samples. In order to do 
this, the attitudes of the two groups toward the program must be correlated. 
Since their attitudes are dichotomized, a tetrachoric coefficient can be used 
for Tay in Formula 14:7. The cross-tabulation is as follows: 



There is obviously a fairly high degree of correlation between the attitudes 
of the two groups. Reference to Thurstone’s Computing Diagrams * shows 
that the correlation is .71. The standard error is now as follows: 


* L. Chesire, M. Saffir, and L. L. Thurstone, Compuiing Diagrams for the Tetrachoric 
Correlation Coefficient, Univ. of Chicago Bookstore, Chicago, 1933. 




DIFFERENCES BETWEEN ARITHMETIC MEANS 


409 




4 


(.60)(.40) ^ 


100 


100 


/(.60)(.40)(.50)(.50) 

\ 10,000 


= IOOV.0049 - .0034 = lOOV.OOlS = 3.9% 
The Test of Significance is therefore: 


^ _ (60% - 50%) - 0 _ 10% _ 

3.9% 3.9% 

The T ratio is greater than 2.5, which indicates a probability of less than 1 
in 100 of a difference as great as 10% occurring on the basis of chance errors 
in sampling and measurement in a universe with a parameter difference of 
zero. We are warranted in rejecting this hypothesis with confidence and con¬ 
cluding that the introduction of a commercial announcement in the middle 
of the program had a negative effect (increased the dislikes) on the attitudes 
of the listeners. 

The importance of the more precise estimate made possible by Formula 14:7 
is obvious from this example. The result completely reverses the inference 
that would have been made had the correlation between the two matched 
groups been unknown or been assumed to be zero. 


E. TESTS OF SIGNIFICANCE FOR A DIFFERENCE BETWEEN 
ARITHMETIC MEANS DERIVED FROM NON-CORRELATED SAMPLES 


Tests of Significance for mean differences play an essential role in evaluat¬ 
ing results in many research studies in psychology and related fields. A Test 
of Significance for a difference between the means of independent samples is 
as follows, (TMj, being obtained by Formula 13:5a. 

D4;8] 


^ {M, - My) -0 


Test of Significance 
{Mx — A/») with the standard er- 
w 2 _L 2 ^ke difference 

between arithmetic 
means derived from 
uncorrelated samples 


The following example, with data from Klineberg,* illustrates a Test of 
Significance for differences between means obtained from independent samples. 
Hence, the third term of the general formula (14:1) is zero. Klineberg used a 
non-language intelligence test in order to study the intelligence of various 
European groups. All his subjects were boys ranging in age from 10 to 12 years. 
He obtained 10 samples of 100 boys in three European cities and seven rural 
areas, making a total of 1000 subjects tested. In all cases, the mean intelli¬ 
gence scores of the urban groups were greater than those of the rural groups. 


*0. Klineberg, **A Study of Psychological Differences Between Racial and National 
Groups in Europe,” Archives of Psychology^ 20il~58,1931. 



410 TESTS OF SIGNIRCANCE FOR DIFFERENCES BETWEEN STATISTICS 


His results for the combined three urban groups and the combined seven 
rural groups were as follows: 




Mean Intelligence 

Standard 



Test Score 

Deviation 

city 

300 

215.7 

45.1 

Rural 

700 

187.1 

50.9 


The difference between the mean intelligence test scores is 28.6. The ques¬ 
tion of whether this difference is likely on the basis of chance or whether it 
may be ascribed, at least in part, to non-chance factors can be answered by 
a Test of Significance, as follows: 

(215.7 - 187.1) - 0 28.6 ^ 28.6 ^ ^ ^ 

//45.1Y /50.9Y ^(2.60)2 (1.9^ 3.23 

Vwsooj ^\vm) 

Since the T ratio shows that the mean difference is 8.9 times the standard 
error of the difference, the hypothesis that the parameter difference is zero 
can be rejected with confidence. That is, the difference between the mean 
intelligence test scores of these rural and urban groups cannot be attributed 
merely to chance factors of sampling and measurement. Extra-chance factors 
operated to produce at least some of it. What these factors were is not a 
statistical but a research problem which Klineberg considers in his analysis of 
the results. 


Fisher’s Null Hypothesis for Differences * 

It was indicated earlier that Fisher’s development and use of the concept 
null hypothesis are somewhat more specific than our more generalized use of 
the concept. We shall illustrate his method for the hypothesis of a zero 
difference between means of samples. 

Fisher interprets the null hypothesis as asserting that the statistics of the 
samples imder consideration are derived from the same universe, and hence 
the parameter difference will be zero. The Test of Significance is therefore 
made in order to determine whether the data support or nullify the hypothesis. 
However, in our earlier examples we based the standard errors of differences 
on the variances of each statistic, as in classical statistics, whereas Fisher 
bases his on an estimate of the variance of the universe, derived from the 
average of the standard deviations of the sample results. 

When there are two or more samples, the average of their respective 
standard deviations taken in reference to their own means is as follows: 


* R. A. Fisher, Staiisiieal Methods for Research Workers, Oliver & Boyd, London, 7th ed., 
1938, pp. 128 ff. 



DIFFERENCES BETWEEN ARITHMETIC MEANS 


411 


/SiCi* H" “h 
' Nt + Nt+. 


• • + 

• + Nn 


[14:9] 
Average of standard 
deviations for two or 
more samples, with de¬ 
viations of each taken 
from their respective 


And the standard error of the di (Terence between the means of any two 
samples in the same universe is (when the size of each sample is greater than 
30, and therefore N instead of TV — 1 is used): 

[ 14 : 10 ] 
Standard error of the 

I - difference between 

= // ^ Y I / Y means, when the vari- 

- Vl WNI) Mmpling 

distribution is based 
on the average of both 
samples 


But since a is the estimate of <ru (the standard deviation of the sampling 
distribution in the universe of the hypothesis), and therefore the variance of 
both samples is the same, <t can be removed from the radical: 

[14:10a] 

r- -— Standard error of the 

^ ^ I — difference between two 

^ ^ V iVi ” TV 2 means when the vari¬ 

ance of both samples 
is the same 

In using Klineberg’s sample data to test Fisher’s null hypothesis we assume 
that both his samples were drawn randomly from the same universe (rural- 
urban). The Test of Significance is made to determine whether or not this is 
the case. is not given in Klineberg’s result, and we shall therefore have to 
determine it from <r and TV. Since 




For the city sample, 
and for the rural sample. 


Therefore 


_ -h 2X2^ 

\ M + TV2 


Xxi^ = 300(45.1)2 
2x2^ = 700(50.9)2 

' /300(45.1)2 + 700(50.9)2 

■ ” \ 300 + 700 


2423770 


= 49.23 


Hence the Test of Significance for Fisher’s null hypothesis is: 


(Ml - M 2 ) - 0 (215.7 - 187.1) - 0 28.6 „ , 

■4923./!^ ■ 

49.2i 



412 TESTS OF SIGNIFICANCE FOR DIFFERENCES BETWEEN STATISTICS 


With a T ratio of 8.4, the mean difference is of course considerably greater 
than would be expected on the basis of chance. The null hypothesis is there¬ 
fore rejected with confidence, which means that the rural and urban samples 
are samples of two different universes with different parameter means, rather 
than samples from the same universe. 

This Test of Fisher’s null hypothesis gives no information beyond that 
derived from the preceding test, in which the variance of each sample was 
estimated separately. The above T ratio of 8.4 is slightly less because the 
average of the standard deviations of the two samples, g) “ 49.23, is some¬ 
what larger than the a of the urban groups, which constituted only 3/10 of the 
total sample of 1000 cases. 

F. TESTS OF SIGNIFICANCE FOR A MEAN DIFFERENCE 
BETWEEN CORREUTED SAMPLES 

The following example illustrates a Test of Significance for mean differences 
when the means are obtained from samples which are not independent of each 
other. It also represents the type of analysis often required in determining 
whether the experimental variable in a psychological experiment has made 
any difference in the result. 

An investigator wishes to determine whether systematic coaching affects 
intelligence test performance. He gives an intelligence test to a sample of 
200 twelve-year-old boys drawn randomly from the city’s school population. 
He then divides the total group into two groups of 100 each, and matches them 
pair by pair on the basis of their intelligence test scores. He uses one group 
of 100 as a control group (C), and the other group as an experimental group 
(E); the latter is coached systematically over a period of several weeks. At 
the end of this period he administers an alternative form of the intelligence 
test to both groups. The results for liis experiment are as follows: 




Intelligence Test Score Results 

Matched Groups 


At Beginning 

At End 


N. 

of Experiment 

of Experiment 



Mean a 

Mean (T 

Control group (C) 

100 

95 12 

97 13 

Experimental group (E) 

100 

95 12 

105 15 


The correlation between the intelligence test scores of the two groups, matched 
pair by pair, is .60. 

Since the two groups were matched, there should be little or no difference 
in their initial mean scores or in the variability of their initial performance. 
That this was actually the case is indicated by the mean of 95 and the standard 
deviation of 12 for each group at the beginning of the experiment. 



MEAN DIFFERENCES FOR CORRELATED SAMPLES 


413 


That coaching may have had a real effect on intelligence test performance 
is suggested by the final results. The mean score for the experimental group is 
now 105, as against 97 for the control group, a difference of 8 points. The 
question is whether such a difference is likely to occur by chance or whether 
the null hypothesis can definitely be rejected. If it can be, then in view of the 
experimental design of the investigation, we are warranted in concluding that 
coaching has a definite positive effect on the intelligence test performance of 
12-year-old boys. 

The standard error of the difference between the means of matched groups 
is: 

[14:11] 

. - Standard error of the 

difference between 
means derived from 
correlated samples 

The correlation between the means of bi-variates is the same as the corre¬ 
lation between the variates themselves. Hence, the above formula may be 
restated as follows: 

(^(Mc-Mg) = ^ 4“ [14 ! llo] 

The Test of Significance which will enable us to answer the research ques¬ 
tion is therefore as follows: 




{Me - Me)-^ 


_(105 - 97) - 0 


imhi 


15 Y 

yioo/ 


- 2(.60) 


_ 8 _ 8 

Is i5~ Vl.69+2.25 -2.34 1-26 


= 6.3 


VlOO VlOO 


where 1.26 is the standard error of the difference between the means of the 
two groups. 

Since the T ratio is 6.3, we can be confident that the mean difference 
between the intelligence test performance of the control and the coached 
groups at the end of the experiment is unlikely on the basis of chance. Hence, 
we can reject the null hypothesis with confidence, and we are warranted in 
concluding that coaching has a real effect upon the intelligence test per¬ 
formance of boys at this age level. 

It should be observed that the positive correlation of .60 between the 
intelligence test performance of the matched pairs in the two groups served 
to reduce the estimate of the standard error of the difference. Had this corre¬ 
lation been zero, the standard error of the difference would have been: 


Vl.69 + 2.25 = 1.98 

Although this standard error is considerably larger than 1.26, the T ratio 
(8/1.98 = 4.0) given by it would still be large enough to warrant the rejec- 



414 TESTS OF SIGNIFICANCE FOR DIFFERENCES BETWEEN STATISTICS 


tion of the null hypothesis. Therefore, if it had not been convenient to com¬ 
pute the correlation coefficient, the Test of Significance without the third 
term of the standard error formula would still have given a T ratio large 
enough to warrant the rejection of the null hypothesis. However, if there is 
any likelihood that the correlation between matched samples will be negative 
rather than positive, it should be computed and the third term of For¬ 
mula 14:11a should be used, since a negative correlation serves to increase 
rather than decrease the size of the standard error. 

Effect of Heterogeneity of “Matched Samples" 

In the preceding experiment the subjects were matched pair by pair on the 
basis of initial intelligence test performance. Two additional factors were 
also controlled, viz., age and sex, by virtue of the restriction of the samples to 
12-year-old boys. Had the samples been heterogeneous in age and sex, vari¬ 
ability in these factors might in themselves have accounted in part for the 
experimental result. That is, uncontrolled differences in age and sex might 
have been partly responsible for the higher performance of the coached group. 

Therefore such factors should be controlled either (1) by setting up control 
and experimental groups that are relatively homogeneous, or (2) by matching 
the two groups on such factors as well as on the behavior to be studied (in 
this case, intelligence test performance). If neither of these procedures is used, 
a measure of control can be introduced by analyzing statistically the possible 
role of variability or heterogeneity within the groups as well as between 
them.* 


G. TESTS OF SIGNIFICANCE FOR A DIFFERENCE BETWEEN 
STANDARD DEVIATIONS 

The comparison of differences in variability between two or more groups 
is often involved in a research problem. For example, in the study of sex 
differences, tradition held that there is greater variability among women than 
among men. No broad generalization, however, is warranted prior to obtaining 
empirical evidence on this problem. Furthermore, it is necessary to obtain 
empirical data for particular traits and attributes and to bring together the 
results of many investigations before broad generalizations about sex differ¬ 
ences in variability are justified. 

The analysis of differences in variability is also important in other types 
of research investigations, as in experiments in which the experimental vari¬ 
able may definitely affect the variability of the sample result. It is often 
relevant to determine whether a difference in variability between experi- 

* Cf., in this r^ard, Eugena Shen, “The Place of Individual Differences in Experimenta¬ 
tion,” chap. 14 in Quinn McNemar and Maud A. Merrill (eds.). Studies in Personalityy 
McGraw-HiU, New York, 1942. 



DIFFERENCES BETWEEN STANDARD DEVIATIONS 


415 


mental and control groups can be attributed merely to chance or whether 
the difTerence is significantly greater than would be expected on the basis of 
chance. 

The formula for the standard error of a difference between standard devia¬ 
tions is as follows: 






[14:12] 

Standard error of the 
difference between 
standard deviations 


where <r<r/ is the estimated variance of the sampling distribution of the 
standard deviation of the x variable; is a similar measure for the y vari¬ 
able; is the correlation between the standard deviations of the two 
variables; is the estimated standard error of the standard deviation of 
the X variable; and is the estimated standard error of the standard devia¬ 
tion of the y variable. (See Formula 13:7.) 

The correlation coefficient for the standard deviations of two variables can 
be directly estimated from the correlation of the bi-variates. It is equal to 
the square of the correlation coefficient, Vx^. The preceding formula thus 
becomes: 

^~ [14 :12a] 

If the standard deviations are derived from independent samples, the corre¬ 
lation is zero and the third term of the above formula becomes zero. 

A Test of Significance for the difference in the variability of two groups will 
be illustrated with David Wechsler’s data on the Information subtest of the 
Wechsler-Bellevue Scale for measuring intelligence.* Wechsler gives the 
means and standard deviations on each subtest for successive age groups 
from 7 to 60 years of age. The variability of performance on the Information 
subtest tends to increase with age. 

We shall compare the variability of two age groups, viz., from 25 through 
29 years and from 40 through 44 years. The mean score of both these groups 
is the same, 10.1 points. But the standard deviation of the younger group 
is 2.98, whereas for the older group it is 3.70. Is this difference of 0.72 points 
reliable, i.e., significantly greater than zero? The following Test of Signifi¬ 
cance answers this question, the two age groups constituting independent 
(non-correlated) samples: 


((Tx - (Ty) - 0 


{(Tx - (Ty) 


(3.70 - 2.98) ~ 0 


If Y I M' / / 3.70 Y / 2.98 Y 
^W2Nx/ W2Ny) ^\V2(75)/ \V2(125)/ 


* David Wechsler, The Measuremenl of Adult Intelligence, Williams & Wilkins, Balti¬ 
more, 3rd ed., 1944. (Data from Table 39, p. 222.) 



416 TESTS OF SIGNIRCANCE FOR DIFFERENCES BETWEEN STATISTICS 


where iVx, the size of the older sample, is 75, and Nj,^ the size of the younger 
sample, is 125. T is equal to 2.0: 


!r = 


_072_ 

V(.3021)* + (.1885)2 


0.72 ^ 0.72 
VAm -36 


The difference between the variabilities of the two groups is twice as large 
as the standard error of the difference. The P value for a T ratio of 2.0 or 
more, where only one tail of the sampling distribution of differences is con¬ 
cerned (as in Fig. 14:1), is approximately .02 (equivalent to the 2% confidence 
level; cf. Table II, Appendix B). Thus, for the hypothesis that the difference 
is zero, the probabilities are 2 in 100 that a difference as large as 0.72 in a 
sample result may be due to chance factors of sampling and measurement. 

If we are satisfied with these odds as a basis for rejecting the null hyi)othesis, 
we will conclude that there is a real difference in variability in the younger and 
older groups on the Information test of the Bellevue-Wechsler Scale. Certainly, 
when T = 2.0, we cannot accept the null hypothesis as likely. But if we 
wish to be cautious, we shall tentatively reject the null hypothesis and con¬ 
clude that there is a real difference only if further samples of test scores 
from these two age groups support this generalization. 

On the other hand, Wechsler’s data for all the age groups above 25 to 
29 years indicate that the T ratio of 2.0 is sufficiently large to warrant rejec¬ 
tion of the null hypothesis. The variability of their scores on the Information 
test is shown in Table 14:1. These data give a compelling reason for accepting 
a T ratio of 2.0 as a satisfactory criterion because additional independent 
samples of older age groups all yield measures of variability larger than that 
of the 25 to 29 age group. 


Table 14:1. Differences Between Successive Age Groups in Results on the 
Information Test of the Bellevue-Wechsler Scale of Intelligence * 


Age Group 

N 

M 

a 

25-29 

125 

10.1 

2.98 

30-34 

no 

9.8 

3.12 

35-39 

100 

9.8 

3.37 

40-44 

75 

10.1 

3.70 

45-49 

60 

9.5 

3.21 

50-54 

45 

9.6 

4.08 

55-59 

36 

9.5 

3.86 


* David Wechsler, The Measurement of Adult Intelligence, WiUiams & Wilkins. Balti¬ 
more, 3rd ed., 1944. (Data from Table 39, p. 222.) 


Combining the Results of Several Groups for a Test of Significance 

To determine the T ratio for the difference in variability in several groups, 
we shall combine the results for the three age groups from 4 0 to 54. 

The standard deviation of a single group result is VSa^/TV, where the devia- 



DIFFERENCES BETWEEN STANDARD DEVIATIONS 


417 


lions, are taken from the mean of the sample. If the several groups to be 
combined have different means, the deviations of each sample must be taken 
from the weighted mean of the combined result. We shall compute this mean 
first for the three age groups, 40 to 44, 45 to 49, and 50 to 54. The weighted 
mean for two or more samples is given by the following formula: 

[14:13] 

M, = + NjMj + • • • + NnM„ Weighted mean of two 

M + ^2 H“ • * * + Nn or more groups com¬ 

bined 

where the subscripts 1, 2, ... n identify the several groups to be combined 
(c). Therefore, for the three age groups whose iV’s and M’s are given in Table 
14:1, we have: 

^ _ 75(10.1) + 60(9.5) + 45(9.6) _ 1759.5 _ ^ » 

' 75 + 60 -1- 45 180 


The standard deviation of combined groups, determined from the respective 
standard deviations of each, is as follows: 


a, = + ♦ - + Nn<rn^ + TV, (M, + MCAfa - M,)« + • ■ > + 7Vn (M» - Mc)« 


[ + iVj + * ■ * + 


[14:14] 


Standard deviation of 
two or more combined 
groups, with devia¬ 
tions taken from the 
weighted mean of the 
combination 


where the subscripts 1, 2, ... n identify the groups whose measures are to be 
combined; and Me is the weighted mean of the combined samples (from 
Formula 14:13). For the three age groups, the standard deviation of the com¬ 
bined result is 3.66: 

^ ^ /75(3.70*) + 60(3.2P) + 45(4.082) + 75[(10.1 - 9,8)*] + 60[(9.5 - 9.8)*] + 45[(9.6 - 9.8)*] 

75 + 60 + 45 

“ “ V13.3775 - 3.66 

180 

We now have the standard deviation of the three age groups combined and 
can test the significance of the difference in variability in these three age 
groups and in the younger age group (25 to 29). The Test of Significance is as 
follows: 

T = (£LZL?kIzL2 = (3.66 - 2.98) - 0 

/ / 3.66 Y , / Y 

y\V2(im) ^ \V2(125)/ 

^_0^68_= = 2 54- 

V(.1929)* + (.1885)* -27 



418 TESTS OF SIGNIFICANCE FOR DIFFERENCES BETWEEN STATISTICS 


The T ratio, 2.5+, confirms what was suggested by the data in Table 14:1, 
viz., that the variability in the scores on the Information test is significantly 
greater fcwr the older age groups than for the younger one. There is less than 
1 chance in 100 that the difference in variability of 0.68 would occur in a 
universe whose parameter difference is zero. Hence, we can reject the null 
hypothesis with confidence and conclude that the variability of the test re¬ 
sults of the older age groups is somewhat greater than that of the younger 
group. 

H. TESTS OF SIGNIFICANCE FOR A DIFFERENCE BETWEEN 
COEFFICIENTS OF RELATIVE VARIATION 

We saw earlier, in presenting measures of variability for descriptive statis¬ 
tics (Chapter 7), that a comparison of the variability of two or more groups 
is sometimes misleading if it is made directly in terms of their standard 
deviations, especially if their means differ considerably. We saw further that 
when the variability of two or more distributions derived from the same 
scale or type of measure is compared, Pearson’s Coefficient of Relative Varia¬ 
tion (V) can be used to avoid any misleading implications that might arise 
from the direct comparison of the standard deviations themselves. V ex¬ 
pressed as a percentage is (100)<r*/M*. 

Since occasions do arise in which comparisons of relative variability are 
required, we shall give the standard error of V and present a Test of Signifi¬ 
cance for the difference between two Coefficients of Relative Variation. The 
standard error of V is given by the following: 

[14:15] 
Standard error of the 
Coefficient of Relative 
Variation 

where V is the Coefficient of Relative Variation and Ng is the size of the 
sample. 

By Formula 14:1, the formula for the standard error of a difference be¬ 
tween two Coefficients of Relative Variation is as follows: 

[14:16] 

. - Standard error of the 

~ ^*+■ V “ difference between Co¬ 

efficients of Relative 
Variation 

For independent (non-correlated) samples, the third term of this formula 
becomes zero. In the following problem, based on a further comparison of 
some of Wechsler’s Information test data, will therefore be equal to: 

[14:17] 

Standard error of the 
difference between Co¬ 
efficients of Relative 
Variation derived from 
non-correlated samples 





DIFFERENCES BETWEEN r CORRELATION COEFFICIENTS 


419 


Wechsler reports the standard deviation variability of 7-year-olds on the 
Information subtest as equal to 1.11. This a is less than half that of the 25 to 
29 age group, which was given in Table 14:1 as 2.98. However, it would be 
misleading to work directly with these standard deviations because the mean 
score of the T-year-olds is only one-fourth as large as that of the older group: 
Mx = 2.5 and My — 10.1. The Test of Significance must therefore be made 
in terms of their respective Coefficients of Variation, V, which will take into 
account these differences between means. Thus: 

Vx = ^ (100) = 44.4% (T-year^lds) 

2.5 

V. = (100) = 29.5% (25 to 29 age group) 

The standard deviation of the 7-year-olds is nearly 45% as large as its mean, 
whereas for the older age group it is only 30% as large as its mean. The differ¬ 
ence in relative variabilityj is 44.4% — 29.5% = 14.9. Is this difference sig¬ 
nificantly greater than zero? 

The Test of Significance is as follows, Nx being 50 and Ny being 125: 



The T ratio is 2.7 and consequently the difference in the relative variability 
of Information test scores of the 7-year-olds and the 25 to 29 age group is 
significantly greater than zero. We can reject the null hypothesis with con¬ 
fidence and conclude that the relative variability of the older age group is 
smaller than that of the younger. Taken in conjunction with the Tests of 
Significance in Section G, the results do not support the hypothesis that 
variability on the Information test of the Wechsler-Bellevue Scale increases 
with age. 

1. TESTS OF SIGNIFICANCE FOR A DIFFERENCE BETWEEN 
PRODUCT-MOMENT COEFFICIENTS OF CORRELATION 

The standard error of a difference between correlation coefficients obtained 
from independent samples is relatively simple to compute because the third 
term of Formula 14:1 will be zero and the formula simplifies to the following: 

[14:18] 

Standard error of the 

_ //I — ri2*V /I — difference between 

=''^<^ri 2 ® + <rrj 4 * =-ul—?^) +(—product-momeiit cor- 
' \ V iVi 2 / \ V / relation coefficients, 

derived from non-cor- 
related samples 



420 TESTS OF SIGNIFICANCE FOR DIFFERENCES BETWEEN STATISTICS 


where the subscript 12 designates the first two variables correlated and the 
subscript 34 the second two variables correlated. (If the second pair of vari¬ 
ables includes either of the first two, the subscript will be 13 or 23, and there 
is a possible correlation between the two sample results.) 

Formula 14:18 has the same limitations as the standard error of a correla¬ 
tion coeflBcient (Formula 13:16). As the value of r increases, the estimates of 
the standard error are increasingly unsatisfactory. We saw that Fisher’s z 
function can be used to test the significance of and to establish confidence 
limits for high values of r. Similarly, the significance of differences involving 
high values of r can be tested in terms of z. The standard error for differences 
in z is equal to: 

[14:19] 

_ / I J Standard error of the 

o’(* 12 -** 4 ) + <^* 84 * = VIC?-? + 1 C?- 5 difference between z’a 

V iVi 2 o iV34 o derived from uncorre¬ 

lated samples 


where the first term of the formula is the variance of the z function for the 
first two variables correlated, and the second term is the variance of this 
function for the second two variables correlated. The distribution of differ¬ 


ences between z is normal, and hence a Test of Significance based upon this 
formula can be interpreted for samples of over 25 cases by means of the 
table of the normal probability integral (Table I, Appendix B). 

Research that calls for a comparison of correlation coefficients is often 
based upon dependent rather than independent samples, and consequently 
the third term of the general formula for the standard error of a difference 
may be required. The exception to this for dependent samples arises when 
there is a negative correlation between the samples, and the T ratio is equal 
to or greater than 2.5. As pointed out previously, in such cases the third term 
of the general formula serves to increase the estimated standard error of the 
difference, and hence reduces the size of the test ratio. 

The standard error of the difference between correlation coefficients derived 
from dependent samples differs,* depending on whether one array is or is not 
common to the two sets of bi-variates. In the first case: 

[ 14 : 20 ] 
Standard error of the 

_ /— r-T- -r- - - difference between cor- 

®’(ri 2 -ri 3 ) relation coefficients de¬ 

rived from bi-variate 
samples with one array 
in common 

If there is no common array: 


®'(»‘12-*'84) ~ ^^**12* “f" ^»^84* ^^’'12’'84*^»‘12^»’S4 


* Cf. C. C* Peters and W. R. Van Voorhis, op. c£^., pp. 195-199. 


[14:21] 

Standard error of the 
difference between cor¬ 
relation coefficients de¬ 
rived from dependent 
samples but with no 
array in common 



DIFFERENCES BETWEEN r CORRELATION COEFFICIENTS 


421 


The correlation between the correlation coefficients for the third term of 
each of these formulas differs and is laborious to compute. Nor will a con¬ 
version to z help in this particular situation because no formula is available 
for estimating the correlation between z functions. 

The attitudes of a group of 79 listeners toward a radio program were 
obtained in the Program Analyzer Laboratories of the Columbia Broad¬ 
casting System.* The two major parts of the program consisted of comedy 
dialogue and a group of songs. The listeners’ attitudes, expressed during the 
program, were correlated with their responses to an opinion questionnaire 
administered at the end of the broadcast, in which the subjects were asked 
whether they would have turned the program off if they had been at home 
(unfavorable response) or would have listened to the end (favorable response). 
The following results were obtained: The correlation between questionnaire 
responses (variable 1) and attitudes toward the comedy dialogue (variable 2) 
was .74 (ria), whereas the correlation between questionnaire responses 
(variable 1) and attitudes toward the songs (variable 3) was only .45 (ris). 
These results suggest that the group’s over-all opinion of the program was 
affected more by the comedy dialogue than by the songs. However, it is 
relevant to ascertain, by means of a Test of Significance, whether the differ¬ 
ence between the two correlation coefficients is significant, or whether it 
might be expected on the basis of chance. 

We shall first present a Test of Significance using the formula (14:20) 
which requires the correlation between attitudes toward the comedy dialogue 
(variable 2) and the songs (variable 3). This correlation, r 23 , is .52. As indicated 
in Formula 14:20, for the Test of significance we shall need the correlation 
between the correlation coefficients, viz., riri 2 ri^)^ This is equal to .50: 




/'28 ■“ 


ri2ri3(l ~ r23^ ~ -h 2r2^i2f'i^) 

2(1 - - ri3*) 


(.74)(.45)[1 - .52»- . 742 - .452+2(.52)(.74)(.45)] 
2(l-.74^)(l-.45*) 


.50 


[14:22] 
Correlation between 
correlation coefficients 
whose bi-variates have 
one array in common 


The standard error of the difference between ri 2 and ri 3 , is therefore: 





2(.50) 


1 - . 74 ^ 1 - . 45 ^ 
V79 ^/79 


.077 


And the Test of Significance for the null hypothesis, i.e., that the parameter 
difference is zero, is: 

(ri2 - rn) - 0 ^ (.74 - .45) .29 ^ 

.077 .077 ‘ 

where .29 is the difference between the two correlation coefficients; zero is the 
parameter value of the hypothesis tested; .077 is the estimated value of the 


Cf. J. G. Peatman and T. Hallonquist, op. cU. 



422 TESTS OF SIGNIFICANCE FOR DIFFERENCES BETWEEN STATISTICS 


standard error of the difference between the two correlation coefficients; and 
3.8 is the test ratio. 

The Test of Significance for the difference between these two correlation 
coefficients yields a T ratio greater than 2.5 or 3.0. Hence, we can be con¬ 
fident that, since the difference is significantly greater than zero, it is not 
merely a chance difference. However, the statistical results of a Test of Signifi¬ 
cance do not in themselves indicate the nature of or the reasons for the 
relationship. When we can definitely reject the null hypothesis, as in this 
case, we examine the character of possible extra-chance factors responsible 
for the difference. Any inferences of causal relations must be based on an 
analysis of the character of the data correlated. In this case, knowledge of 
the nature and sequence of events, together with interviews with the sub¬ 
jects, warrants the inference that the subjects’ opinions of the program as a 
whole were determined more by their attitudes toward the comedy dialogue 
than by their attitudes toward the songs. 

We shall now present a Test of Significance for these data and use only the 
variances of each coefficient (Formula 14:18), in order to see whether the labor 
involved in computing the third term of Formula 14:20 was necessary in this 
particular case. The estimate of the standard error of the difference between 
the two correlation coefficients is now as foUows: 

The standard error is increased somewhat, from .077 to .103. Hence, the 
T ratio below differs from the T of 3.8 obtained with the complete standard 
error formula: 

T = (^12 ~ ^ 13 ) "" Q _ >29 ^28 
.103 

Thus, in this particular case the third term of Formula 14:20 yields a T ratio 
greater than 3.0, whereas the abbreviated formula gives a T ratio less than 
3.0 but greater than 2.5. The null hypothesis can of course be more confi¬ 
dently rejected with a T ratio of 3.8 than with one of 2.8. 

EXERCISES 

1. What hypothesb is usually considered in a Test of Significance for a difference be¬ 
tween two statistics? Why? 

2. Under what circumstances is it appropriate to omit the third term of Formula 14:1? 

3. Compare the logic underlying a Test of Significance for a statistic with that for a 
difference between two statistics. 

4. Using the data in Table 5:14, determine the percentage of college freshmen with an 
intelligence test score above 75, the percentage of their best friends with an intel¬ 
ligence test score above 75, and the significance of the difference between these two 



DIFFERENCES BETWEEN r CORRELATION COEFRCIENTS 423 

5. Using the same data, determine whether the difference in the means, variability, 
and relative variability of the freshmen and their best friends is significant for each 
of the following variables: 

a. average grades 

b. intelligence test scores 

c. ages 

(Note that the variables to be compared are not derived from independent samples.) 

6. Using the data in Table 14:1, determine whether the difference between the means 
and between the standard deviations on the Bellevue-Wechsler scale is significant 
for the following age groups: 

a. 25 to 29 and the 35 to 39 age group 

b. 25 to 29 and the 50 to 59 age group 

7. Using the same data, combine the results for the first three age groups (25 to 39) 
and determine whether the difference between the mean and standard deviation of 
this combined group is significantly different from the mean and standard deviation 
of the combined 40 to 59 age group. 

8. Using the correlation coefilcients obtained in Exercise 12, Chapter 9, for the data in 
Table 5:14, determine whether the difference in the correlations between average 
grades and intelligence test scores is significant. 



CHAPTER 15 


Chi-Square and Tests of Significance 


Karl Pearson’s chi-square technique is a statistical method for the testing 
of hypotheses concerning distributions of frequencies. Since categorical data 
consist basically of the data of frequencies, chi-square is especially useful in 
testing hypotheses about such data. However, it can be used generally to 
include classes of frequencies derived from variables. 

The statistical hypotheses which can be tested by chi-square are many, 
the restrictions being mainly of two kinds. First, as already indicated, the 
hypotheses must concern statistical frequencies of categories or classes. Chi- 
square is not directly applicable to hypotheses involving other kinds of 
data or statistical measures, but it can be adapted to proportions or percent¬ 
ages. Second, it is usually not a reliable technique if the number of hypotheti¬ 
cal frequencies for any class or category is less than 10.* It should also be 
emphasized that the size of the sample, from which the frequencies per 
class are derived, should be fairly large, t Finally, the technique is based on 
the assumption that the frequencies of each class are independent of each 
other. 

Except for these limitations, chi-square provides a general technique of 
analysis. The number of statistical hypotheses that can be tested is limited 
only by the total number of frequencies in a sample. Thus, a group of empiri¬ 
cal data can be tested for their possible divergence (on the basis of chance) 
from any hypothetical grouping of N frequencies, as long as there are no less 
than 10 hypothetical frequencies per class. If, for example, a random sample 
of 100 people consists of 60 men and 40 women, we can test the hypothesis 
that the division of frequencies in this sample is only a chance deviation from 
a hypothetical universe in which the sexes are evenly divided, i.e., 50% men 
and 50% women. With the chi-square technique we can obtain an estimate of 
the probability of a sample result of 40 or less women and 60 or more men 
for a universe in which the sexes are equally divided. We can also test the 
hypothesis that the universe is divided in the proportion of .75 men and .25 


* G. U. Yule and M. G. Kendall, An Introduction to the Theory of Statistics, Charles 
Griffin & Co., London, 12th ed., 1940. According to these two authors (p. 422), “No 
theoretical cell frequency should be small... 5 should be regarded as the very minimum 
and 10 is better.'* R. A. Fisher likewise agrees that 5 is the minimum. {Statistical Methods 
for Research Workers, Oliver & Boyd, London, 7th ed., 1938, p. 87.) 

t iV should be at least 50, however few the number of cells.” (Yule and Kendall, op. cU., 
p. 422.) 


424 






CALCULATION AND PROBABIUTY OP CHI-*SQUARE 


425 


women, .65 women and .35 men, etc. In such cases, the proportionate values 
of each class are converted into frequencies on the basis of iV«, the size of the 
sample. 

Regardless of the particular hypothesis tested, the chi-square technique 
consists in a comparison of the chance implications of the hypothesis with 
the sample result. If the sample result could be expected to occur in random 
samples of the hypothetical universe on the basis of chance alone, it is a 
chance implication of the hypothesis and cannot be rejected. On the other 
hand, if the sample result cannot be interpreted as a chance implication of the 
hypothesis, the latter can be rejected. As R. A. Fisher puts it, the facts are 
given the opportunity to disprove a hypothesis. If a hypothesis is disproved, 
the possible implications of this fact are considered. 

A. CHI-SQUARE FOR THE DISTRIBUTION OF NON-VARIABLE 
AND VARIABLE ATTRIBUTES 

Calculation of Chi-Square 

The first step in computing chi-square is to establish for each sample 
category or class the number of frequencies which would be expected on the 
basis of the hypothesis to be tested. In other words, a relevant hypothesis 
stated in terms of frequencies per category must first be set up and then 
tested. Some hypotheses are, of course, more relevant than others. Very often 
the most relevant hypothesis is the null hypothesis of a purely chance dis¬ 
tribution of frequencies into two or more classes. Thus, in the case of a coin 
assumed to be fair, a hypothetical distribution of frequencies for beads and 
tails is equally divided into two independent categories. In the case of a die, 
there would be six independent categories of events, and a chance distribution 
of the frequencies of each would be l/6(iVa), where is the size of the 
sample. Similarly, people’s attitudes toward an event might be assumed, on 
the basis of chance afone, to be dichotomized into two categories, each equal 
in size. 

What the chi-square technique does is to compare the divergence or devia¬ 
tion of the sample frequencies per category from the hypothetical frequencies 
for each category studied. The greater the difference between sample fre¬ 
quencies and hypothetical frequencies per category, the less the probability 
that the differences are attributable only to chance errors of sampling and 
measurement. Chi-square itself is a measure that expresses the extent of the 
differences between hypothetical and sample results. The value of chi-square 
for a given hypothesis having been computed, the probability of differences 
as great as those between the sample frequencies per category or class and the 
hypothetical frequencies per category or class occurring on the basis of chance 
alone can then be estimated. In the light of this probability value, the hy¬ 
pothesis can then be rejected or not rejected, depending on the confidence 



426 


CHI-SQUARE AND TESTS OF SIGNIHCANCE 


criteria used in judging the character of the result, i.e., whether or not it is a 
likely or unlikely result for the hyi)othesis. 

Once the hypothetical frequencies for a given problem are set up, there 
remain the following relatively simple steps in computing chi-square: 

1. The difference between the hyi)othetical and the sample frequencies is 
determined for each category or class. 

2. Each of the differences per category or class is squared. 

3. The ratio of the resulting squares to the hypothetical frequency per 
category or class is obtained. 

4. These ratios are added to give the value of chi-square for the hypothesis 
tested. Thus, by formula: 

D5:i] 

^ fh J Chi-square 

where/, is the number of sample frequencies per category or class; fh is the 
hypothetical frequencies for corresponding categories or classes; and S 
symbolizes the process of summing all the ratios for the categories or classes 
under consideration. 


A Chi-Square Test of Significance of Consumers’ Brand 
Preferences (a Dichotomy) 

A random sample of interviews with 1000 housewives gives the following 
results: 

Housewives’ preferences for Brand A == 600 (/,^) 

Housewives’ preferences for Brand B = 400 (/,^) 

A relevant hypothesis here is a chance distribution of housewives’ preferences 
into two categories, viz., 50% for Brand A and 50% for Brand B. Since a hy¬ 
pothesis for a chi-square test is stated directly in terms of frequencies, the 
division of hypothetical frequencies would be as follows: 

Brand A, 500 preferences 
Brand B, 500 preferences (fh^) 

If this hypothesis can be rejected with confidence, the conclusion follows that 
the 600 preferences for Brand A are not merely a chance result but rather 
indicate that a majority (more than 50%) of all housewives of the universe 
sampled prefer that brand. 

Chi-square for this hypothesis is computed as shown in Table 15:1. The 
chi-square value is found to be 40. However, is a chi-square value as great as 
40 likdy for the hypothesis tested, on the basis of chance alone? The greater 
the differences between sample and hypothetical frequencies, the greater the 
value of X* M^d the less the likelihood of their chance occurrence. 



CALCULATION AND PROBABIUTY OF CHI-SQUARE 


427 


Table 15:1. Computation of Chi>Square for o Test of 
Significance of Consumer’s Brand Preferences 



Sample of Con¬ 
sumers' Brand 
Preferences 

(f.) 

Frequencies by 
Hypothesis 

(fA) 

Differences * 

f.-fA 

Differences 

Squared 

(f. - fA)* 

Chi-Square 

Ratio 

(f. - Fa)* 
fh 

Brand A 

600 

500 

100 

10,000 

20 

Brand B 

400 

500 

100 

10,000 

20 



1000 



o 

'e- 

II 


From Table 15:2t 

For 1 d.f., P = .001, for x* ^ 10.83 


♦ The signs of differences, (/, — /a), can be neglected because the differences are squared. 


The Probability of Chi-Square 

Sampling distributions of the chi-square statistic are not of the normal, bell¬ 
shaped type unless there are around 30 classes or categories. For most Tests 
of Significance in terms of chi-square, the number of classes or categories is 
considerably less—often only 2, as in Table 15:1. The form of the sampling 
distribution varies considerably for 2, 3, 4, etc., classes up to 30. For 2 degrees 
of freedom, the sampling distribution is a curve like that shown in Fig. 15:1, 
in which the ordinate represents the frequency of the sample results and the 
abscissa represents the value of When there are 2 degrees of freedom, the 
form of the sampling distribution is like that in Fig. 15:2, which is similar to a 
dichotomized half of the standard, normal probability curve. 



Fig. 15:1. Fig. 15:2. Fig. 15:3. 


The mode of both these sampling distributions is at a x* value of zero. 
But when there are 3 degrees of freedom, the mode shifts from zero and the 
curve is extremely skewed, as in Fig. 15:3. As the number of classes or cate¬ 
gories increases, the form of the sampling distribution gradually approaches 
the normal, bell-shaped curve. 

What we need for chi-square, therefore, is probability values for categories 
or classes ranging from 2 to 30. Beyond 30, the implications of the normal 
probability curve can be utilized. The probability values required are pre¬ 
sented in Table 15:2.t This table is set up differently from Table II, Ap- 

t R. A. Fisher and F. Yates, Statistical Tables for Biological, Agricultural, and Medical 
Research, Oliver & Boyd, London, 1938, Table IV, p. 27. 



















428 


CHI-SQUARE AND TESTS OF SIGNIFICANCE 


pendix B, for T of large sample theory, in that the body of the table consists 
of chi-square rather than P values, the latter being given for 11 values of P, 
as indicated at the head of the columns. Furthermore, these probability 
values are developed in terms of degrees of freedom^* rather than of the total 
sample frequencies, N,. 


Degrees of Freedom (dJ.) 

The concept of degrees of freedom is analogous in its implications to TV,. 
However, in the case of chi-square the P values in Table 15:2 were originally 
develofied not for iV, but in terms of the number of classes or categories for 
which frequency values based on the hypothesis could he freely assigned.^ 

The number of degrees of freedom (df.) for any hypothesis is equal to the 
number of categories or classes for which hypothetical frequency values can 
be freely assigned. This means that the number of degrees of freedom is equal 
to the total number of categories or classes minus the number of constraints 
imposed upon the data in establishing the hypothetical frequencies. In 
Table 15:1 there was one constraint, viz., that the sum of the hypothetical 
frequencies be equal to iV,, the size of the sample. Thus, as soon as a hypo¬ 
thetical value of 500 was set up for Brand A, the number of frequencies for 
Brand B had to be taken as 500, because the total number of frequencies 
(500 and 500) must equal 1000, the number of the observations in the sample. 
Since there were 2 categories, the number of degrees of freedom is therefore 1. 

Since this type of constraint always enters into the determination of the 
hypothetical frequencies for one class or category in any problem, df. is 
always at least one less than the total number of classes or categories. We 
shall see later that additional constraints are sometimes imposed in setting 
up hypothetical frequencies. For each additional limitation, an additional 
degree of freedom is lost. Therefore, 

d.f. = n classes or categories minus n constraints imposed by the 
hypothesis 

Table 15:2 gives the distribution of chi-square values for degrees of freedom 
from 1 to 30, and for 11 P values. In the first row of the body of the table are 
the chi-square values to be expected for sampling distributions with 1 degree 
of freedom. When d.f. = 1, the probabilities are at least 99 in 100 (P = .99^) 
of obtaining, on the basis of random sampling, a chi-square value equal to or 
greater than .00. According to the last column of the first row, when d.f. = 1 
the probabilities are only 1 in 1000 (P = .001) of obtaining a sample result in 
which chi-square is equal to or greater than 10.83. 


* M. Walker, “Degrees of Freedom/’ Jova^nal of Educational Psychology^ 313:253-269, 
1940. 

t Gf. Karl Pearson, Tables for Statisticians and Biometricians, Cambridge University 
Prm, Cambridge, 1914, pp. xxxi-ioExiii, 26-28. 



CALCULATION AND PROBABILITY OF CHI-SQUARE 


429 


Table 15:2. Distribution of Chi-Square * 

Probability Values for Chi-Square with Degrees of Freedom from 1 to 30 


dJ. 

.99 

.96 

.90 

.60 

Probabilltyt P 

.30 .20 .10 

.05 

.02 

.01 

.001 

1 

.00 

.00 

.02 

.46 

1.07 

1.64 

2.71 

3.84 

5.41 

6.64 

10.83 

2 

.02 

.10 

.21 

1.39 

2.41 

3.22 

4.60 

5.99 

7.82 

9.21 

13.82 

3 

.12 

.35 

.58 

2.37 

3.66 

4.64 

6.25 

7.82 

9.84 

11.34 

16.27 

4 

.30 

.71 

1.06 

3.36 

4.88 

5.99 

7.78 

9.49 

11.67 

13.28 

18.46 

5 

.55 

1.14 

1.61 

4.35 

6.06 

7.29 

9.24 

11.07 

13.39 

15.09 

20.52 

6 

.87 

1.64 

2.20 

5.35 

7.23 

8.56 

10.64 

12.59 

15.03 

16.81 

22.46 

7 

1.24 

2.17 

2.83 

6.35 

8.38 

9.80 

12.02 

14.07 

16.62 

18.48 

24.32 

8 

1.65 

2.73 

3.49 

7.34 

9.52 

11.03 

13.36 

15.51 

18.17 

20.09 

26.12 

9 

2.09 

3.32 

4.17 

8.34 

10.66 

12.24 

14.68 

16.92 

19.68 

21.67 

27.88 

10 

2.56 

3.94 

4.86 

9.34 

11.78 

13.44 

15.99 

18.31 

21.16 

23.21 

29.59 

11 

3.05 

4.58 

5.58 

10.34 

12.90 

14.63 

17.28 

19.68 

22.62 

24.72 

31.26 

12 

3.57 

5.23 

6.30 

11.34 

14.01 

15.81 

18.55 

21.03 

24.05 

26.22 

32.91 

13 

4.11 

5.89 

7.04 

12.34 

15.12 

16.98 

19.81 

22.36 

25.47 

27.69 

34.53 

14 

4.66 

6.57 

7.79 

13.34 

16.22 

18.15 

21.06 

23.68 

26.87 

29.14 

36.12 

16 

5.23 

7.26 

8.55 

14.34 

17.32 

19.31 

22.31 

25.00 

28.26 

30.58 

37.70 

16 

5.81 

7.96 

9.31 

15.34 

18.42 

20.46 

23.54 

26.30 

29.63 

32.00 

39.25 

17 

6.41 

8.67 

10.08 

16.34 

19.51 

21.62 

24.77 

27.59 

31.00 

33.41 

40.79 

18 

7.02 

9.39 

10.86 

17.34 

20.60 

22.76 

25.99 

28.87 

32.35 

34.80 

42.31 

19 

7,63 

10.12 

11.65 

18.34 

21.69 

23.90 

27.20 

30.14 

33.69 

36.19 

43.82 

20 

8.26 

10.85 

12.44 

19.34 

22.78 

25.04 

28.41 

31.41 

35.02 

37.57 

45.32 

21 

8.90 

11.59 

13.24 

20.34 

23.86 

26.17 

29.62 

32.67 

36.34 

38.93 

46.80 

22 

9.54 

12.34 

14.04 

21.34 

24.94 

27.30 

30.81 

33.92 

37.66 

40.29 

48.27 

23 

10.20 

13.09 

14.85 

22.34 

26.02 

28.43 

32.01 

35.17 

38.97 

41.64 

49.73 

24 

10.86 

13.85 

15.66 

23.34 

27.10 

29.55 

33.20 

36.42 

40.27 

42.98 

51.18 

25 

11.52 

14.61 

16.47 

24.34 

28.17 

30.68 

34.38 

37.65 

41.57 

44.31 

52.62 

26 

12.20 

15.38 

17.29 

25.34 

29.25 

31.80 

35.56 

38.88 

42.86 

45.64 

54.05 

27 

12.88 

16.15 

18.11 

26.34 

30.32 

32.91 

36.74 

40.11 

44.14 

46.96 

55.48 

28 

13.56 

16.93 

18.94 

27.34 

31.39 

34.03 

37.92 

41.34 

45.42 

48.28 

56.89 

29 

14.26 

17.72 

19.77 

28.34 

32.46 

35.14 

39.09 

42.56 

46.69 

49.59 

58.30 

30 

14.95 

18.49 

20.60 

29.34 

33.53 

36.25 

40.26 

43.77 

47.96 

50.89 

59.70 


* Table 15:2 is abridged from Table IV of Fisher: Statistical Tables for Biological^ 
Agricultural and Medical Research, Oliver & Boyd, Ltd., Edinburgh, by permission of the 
author and Publishers. 


The value of chi-square in Table 15:1 was 40. If Table 15:2 were extended 
sufficiently, we would see that the probabilities for a chi-square value as 
great as 40 are very small. However, as indicated in the table, there is only 
1 chance in 1000 of obtaining chi-squares equal to or greater than 10.82 when 
d.f, = 1. But this is the 0.1% confidence level developed in Chapter 12. Hence 
we can reject the hypothesis and conclude that it is likely that a majority of 
the universe of housewives prefer Brand A. 






430 


CHI-SQUARE AND TESTS OF SIGNIFICANCE 


Chi-Square as a Test of Significance 

We have seen that a Test of Significance yields a test ratio, T, as follows: 


where s is the sample value of a statistic; h is the parameter value by hypothe¬ 
sis; and (r« is the standard deviation of the sampling distribution of the 
statistic. When the sampling distribution can be assumed to have the form 
of the standard, bell-shaped normal probability curve, the P value of the 
T ratio can be obtained directly from the table of normal probability values 
(Table II, Appendix B). 

The value of is analogous in its logical implications to a test ratio; that 
is, the chi-square statistic is in itself a Test of Significance. Thus 


X* = sum 


(square of difference of sample and theoretical frequenciesN 


^ \ theoretical frequencies / 

Instead of the sampling distribution of being measured in terms of cr„ as 
in the case of T, it is set up in terms of frequencies. Just as a T ratio of 2.0 
indicates a sample result 2 standard deviations above the parameter mean 
of the sampling distribution, so x* is a measure of the distance on the abscissa 
of the sampling distribution. Chi-square itself measures the difference between 
a x^ of zero (which signifies no difference between the sample result and the 
hypothesis) and the x^ value of the sample result. Because the form of the 
sampling distribution of chi-square varies with different degrees of freedom 
from 1 to 30, the probability values of a given value of x^ are set up for various 
P values and confidence levels, as shown in Table 15:2. 

Chi-Square for d.f. > 30 

We have said that when there are more than 30 degrees of freedom, the 
sampling distribution of chi-square is similar in form to the standard, normal 
probability curve. Fis her * has indicated that in such cases the expression, 
- V2(d.f.) - 1, is distributed normally with a standard error of 1.0. 
Thus for a problem with 35 d.f, and x* equal to 65, the Test of Significance is 
set up in terms of T for large sample theory, as follows: 

__ [15:2] 

T =s ^ ^ _ (V2x^ —y/Zjd.f) — l) — 0 Ghi-squareTestof Sig- 
^ ^ at ^ 1.0 nificance when the 

_ _ number of categories 

= V2(65) — V 70 — 1 = 11.4 — 8.3 = 3.1 or classes is more than 

30 


A T ratio of 3.1 (Table II, Appendix B) signifies a sample result that will 
occur less than 1 time in 1000 in random sampling. Hence such a result is 
extremely unlikely and the hypothesis can be rejected with confidence. 


R. A. Fisher, op. cU., p. 85. 



CALCULATION AND PROBABIUTY OF CHI-SQUARE 


431 


A Chi-Square Test of Significance for a Trichotomy 

Chi-square is particularly useful for testing hypotheses concerning tri¬ 
chotomies for which the sample data cannot well be dichotomized and ana¬ 
lyzed by a Test of Significance for a percentage or a proportion. Amen’s data 
on pre-school children’s responses to pictures in Table 2:1 are a case in point. 
If we assume that the data of her 4-year-old group consisted of a sample of 
99 responses, we can employ chi-square to determine whether or not their 
distribution differed significantly from a purely chance division (1/3 of N, 
for each category). The number of d.f.’s will be 2. 

The sample and hypothetical frequencies per category are presented in 
Table 15:3, and the value of is found to be 14.79. According to Table 15:2, 
a difference as great as this will occur in random sampling less than 1 time in 
1000 . Hence, the null (chance) hypothesis can be rejected, and it can be 
concluded that a plurality of the responses are of the outer activity type. 

Table 15:3. Chi-Square for the Hypothesis of a Chance 
Division of Frequencies in a Trichotomy 


Category of 
Response, 
Amen's Data 

Sampie Result 

h 

Chance 

Hypothesis 

h 

Differences 

(f. - w 

Differences 

Squared 

(f« - h)^ 

Chi-Square 

Ratio 

(f. - h)yh 

Static form 

23 

33 

jmm 


3.03 

Outer activity 

51 

33 



9.82 

Inner activity 

25 

33 



1.94 


N, = 99 

99 

■■ 

mm 

X* = 14.79 


From Table 15:2: 

For 2 d.f., P = .001, for x* ^ 13.82 


A Chi-Square Test of Significance for the Distribution of a Variate 

Tests of Significance for the skewness and kurtosis of uni-modal types of 
distributions were presented in Chapter 13. In effect, two of the fundamental 
properties of the normal, bell-shaped curve, rather than the distribution as 
a whole, were analyzed separately by the centile method. The parameter 
skewness of such a distribution is taken as zero, and the parameter kurtosis 
(when measured in terms of Q/D) as .263. Both of these tests, however, are 
approximations and do not take into account the differences from one class 
interval to the next between a sample distribution and the normal, bell¬ 
shaped distribution. It is possible by means of the chi-square statistic, how¬ 
ever, to make a single Test of Significance for the form or character of a 
variate distribution as a whole. A distribution of any type can be set up by 
hypothesis, and the differences between sample frequencies and hypothetical 


















432 CHI-SQUARE AND TESTS OF SIGNIFICANCE 

frequencies per class interval can be evaluated by a chi-square Test of 
Significance. 

To illustrate this procedure, we shall use the distribution of test scores on 
page 435, for which measures of skewness and kurtosis were obtained by the 
centile method and then evaluated in terms of appropriate Tests of Signifi¬ 
cance. It will be recalled that the T ratio for skewness was 1.1 and for kurtosis 
0.6. In the light of these two results the hypothesis that the test might have 
yielded a more normally distributed variate with larger samples of students 
was not rejected. By means of an analysis of the sample distribution with a 
chi-square Test of Significance, we shall analyze the divergence of the dis¬ 
tribution itself (not simply its properties of kurtosis and skewness) from the 
normal, bell-shaped distribution. 

The Calculation of Hypothetical Frequencies per Class Interval for a Normal 
Distribution 

The first step in analyzing a frequency distribution with chi-square is to 
determine the number of frequencies for each class interval on the basis of 
the hypothesis to be tested. In other words, the hypothetical frequencies (fh) 
for each category or class (class interval in this case) must be determined, 
as was done in Tables 15:1 and 15:3. 

It is relatively easy to set up a normal distribution for a given number of 
hypothetical frequencies equal to iV„ the size of a sample, if the intervals are 
taken in z score units and only the frequencies at the mean and at the mid¬ 
points of successive z score intervals are to be obtained. Table I, Appendix B, 
which gives the ordinate values of a normal distribution whose total area is 
taken as unity, can be used in computing the number of frequencies at any 
x/a point on the abscissa, once the number of frequencies at the mean is 
determined. The latter is equal to the following: 

[15:3] 

^ Na N, Number of fre^encies 

~ 2 51 ( 7 ' mean of a finite 

’ distribution taken by 

hypothesis as normal 

where N, is the size of the sample, or' is the standard deviation of the distri¬ 
bution in unit step-intervals, the constant tt equals 3.1416, and is 2.51. 

The mean of the distribution of 250 test scores is 90.66 (or 90.7) and its 
standard deviation is 5.70. Since the size of the original class intervals of the 
data is 2.0 units (the distribution is given in Table 15:4): 

a' = a/i = 5.70/2.0 = 2.85 

Hence the hypothetical number of frequencies at the mean is: 



CALCULATION AND PROBABILITY OF CHI-SQUARE 


433 


The number of frequencies at successive z score intervals above and below 
the mean can now be readily determined, since the fractional height of an 
ordinate at any point to the height at the mean is a fixed proportion {see Table 
lA, Appendix B, page 511). By means of this table, the frequencies at other 
points are found to be equal to the following: 

y at mean = 34.9 frequencies 

y at ±0.5(7 = .88(34.9) = 30.7 
y at ±1.0(7 = .61(34.9) = 21.3 
y at ±1.5(7 = .32(34.9) = 11.2 
yat ±2.0(7 = .14(34.9) = 4.9 
y at ±2.5(7 = .04(34.9) = 1.4 
yat ±3.0(7 = .01(34.9) = 0.3 


Fig. 15:4. The Normal Probability Curve Fitted to a Sample Distribution of Test 



These values were used to plot the normal curve in Fig. 15:4, which also 
portrays the distribution of the actual sample result. 

Although the hypothetical, normal distribution itself can be readily ob¬ 
tained, we still do not have the hypothetical frequencies of the normal curve 
for the class intervals of the sample distribution. But it is these frequencies 
that we need for a chi-square Test of Significance. In order to obtain them we 
must (1) lay off the original score limits of the class intervals in terms of their 




434 


CHI-SQUARE AND TESTS OF SIGNIFICANCE 


z score (x/a) equivalents, (2) determine the proportion of the area in the 
normal distribution that lies within each interval, and (3) take these propor¬ 
tionate areas to a base of N» so as to obtain the hypothetical frequencies of 
each class interval. The results are presented in Table 15:4. 

Column 1 of this table lists the 14 classes (class intervals) of the test score 
variable. However, the last two are combined with the 12th because of the 
few hypothetical frequencies in these intervals. Columns 2 and 3 give the 
frequency distribution of the sample. The upper mathematical limits of each 
class interval are given in Column 4; these are the points in the distribution 
to be converted to z score equivalents. 

Column 5 gives the deviation value, x, of the upper limit of each class inter¬ 
val. Thus for Class 1, the upper limit is 101.5 and its x value is 101.5 — 90.7 
~ 10.8. Column 6 gives the x/a or z score equivalents of the upper limits of 
each interval. The proportion of the area in each class interval can be ob¬ 
tained from Table I, Appendix B, which differentiates the area of the normal 
curve above and below the mean in x/<7 or z score units. The values in 
Column 7 are read directly from that table. Thus, .4706 of the area lies 
between the mean and x/c = 1.89. This is the upper limit of the sample dis¬ 
tribution. The normal distribution, however, theoretically extends to infinity, 
and hence the total area of the upper half of the distribution, .5000, is also given. 

The proportions of the total area within each class interval in Column 8 
are obtained from Column 7. Thus, the area of the tail at the upper end of 
the distribution is equal to .0294. This is found by taking the difference 
between ,5000 and ,4706, the proportion of the area between the mean and 
101.5. Similarly, the proportion of the area in Class Interval 1 is .4706 — .4383 
= .0323; in Class Interval 2, .4383 — .3830 = .0835, etc. The area values in 
Column 7 must be added at the class interval in which the mean is located, 
because .0557 is the area between 91.5 and the mean of 90.7, and .0832 is the 
area between 89.5 and the mean. Tliis sum is .1389, as shown in Column 8. 
The proportion of the area between the upper limit of Class Interval 12 and 
the mean is given in Column 7 as .4750. Hence .0250 of the total area lies 
below this point, i.e., between 79.5 and oo. 

The proportion of the area within each class interval in Column 8 is con¬ 
verted to frequencies by multiplying each proportion by AT., the size of the 
sample. These hypothetical frequencies for the chi-square Test of Significance 
are given in Column 9. At the upper tail of the distribution, the frequencies 
(7.35) which lie beyond the limits of the sample distribution are combined 
with those of the highest class interval, and hence 15.43 is the number of 
hypothetical frequencies for Class Interval 1. 

The Computation of Chi-Square 

It is now possible to determine by chi-square whether the difference between 
the sample distribution of frequencies and a normal distribution of frequencies 
is or is not attributable to random errors of sampling and measurement. 



>- 

X 

0) 

£ 


o 

t 

£ 

_g 

u 

L. 

4) c 

a o 

II 

I« 

a 8- 

■g ^ 

” =5 

«/> GO 

11 

:: ° 

Zs 

<D 

«o 

U 0) 

I 

o- S 
£ §• 
iZ <u 


a 

E 


c 

o 

'■§ 

*iZ 

ts 

o 


»o 


-£ 


( 9 ) 

Hypothetical 
Frequencies per 
Class 

h 

»o 

JO 

lOOOlOCNiOCMaOOOCMOOlO 

ro-;eqooo>OKCMiO'< 4 -io*-cv 

l<oocridKcr>'«toi«o^cNK<> 

•-CNCNCOcncOM*-^- 

250.00 

( 8 ) 

Proportion of 
Area Within 
Each Class 

'^'^CM'^KCSO^.-CO'O — KO 
0 «CVtO€Of-CSOOO>«KOOOIO 
cscoiooot-eococNOKioeNCM 
oooo — •-.-•-•-oooo 

o 

o 

o 

q 

II 

w 

( 7 ) 

Proportion of 
Area from M 

(a) 

O 'O CM O "O O' K « 'O CS CO o o 
OOCDCOC^KIOCOCMOO'O'OIO O 

O K CO 00 O '00 lO eo — — O'K O 

‘O'^'^COCM — OOCMCOCO’^’^ *0 

( 6 ) 

z Score Values 

xj<T 

-T* 0 '*^ 0 *''eO‘-'t*-' 0 »-' 0 ^'Ot-'T- 
8 ooio»->oo'^>-<m( 0 (^(nm>(^co 8 

O — — — lll'"'"'"^o 

£ ' ' ' 1 1 1 1 £ 

( 5 ) 

Upper Limits 
Minus M 

X 

00 00 00 00 00 00 CM CM CM CM CM CM CM 

d 00 «<> V cm’ ’ CO «d k d CO 

1 1 1 1 1 77 

( 4 ) 

Upper Limit of 
Intervals 

iqiqiqiqiqiqiqiqiqiqiqiq 

^drv«dco^’dK»dcor-*o^ 

OO'O'O'O'O'OOOOOOOOOOK 

( 3 ) 

Frequencies per 
Class Interval 

U 

— OOCOO*OCO'OCM.- 0 '^CM'^ — 

— ^-COCMCM-M-COCM — i- — 

N = 250 

( 2 ) 

Mean = 90.7 
or = 5.70 

Test Scores 

O 0 'K* 0 C 0 t— 0 'N'OCO*- 0 'K »0 

immmim 

O»O'O> 0 ' 0 ' 00 00 00 Q 0 00 NNK 

0 ) 

Gasses 

^-CMro”^« 0 'OKooO' 0 »-(Mco'e- 


435 


♦ From Table I, Appendix B. 















436 


CHI-SQUARE AND TESTS OF SIGNIFICANCE 


The chi-square analysis is given in Table 15:5, the computations being 
made in exactly the same way as in Tables 15:1 and 15:3. Chi-square is found 
to be 27.75. What is the P value for this result? To answer this question, the 
number of degrees of freedom must first be determined. This is either 9 or 11, 
depending upon the way in which the sampling of the universe is interpreted. 


Table 15:5. Chi-Square Test of Significance for a Normal Distribution 
(Data from Table 15:4) 


0) 

Clast 

(2) 

Sample 

Frequencies 

f. 

(3) 

Hypothetical 

Frequencies 

h 

(4) 

Differences 

(f. - w 

(5) 

Differences 

Squared 

(f. - w* 

(6) 

Chi-Square 

Ratio 

(#. - 

(1) 

11 

15.45 

4.45 

19.80 

1.28 

(2) 

18 

13.80 

4.20 

17.64 

1.28 

(3) 

33 

20.85 

12.15 

147.62 

7.08 

(4) 

20 

27.90 

7.90 

62.41 

2.24 

( 5 ) 

25 

33.05 

8.05 

64.80 

1.96 

( 6 ) 

43 

34.72 

8.28 

68.56 

1.97 

(7) 

36 

32.28 

3.72 

13.83 

.43 

(8) 

22 

26.58 

4.58 

20.98 

.79 

(9) 

n 

19.40 

8.40 

70.56 

3.64 

00) 

10 

12.52 

2.52 

6.35 

.51 

01) 

14 

7.18 

6.82 

46.51 

6.48 

02) 

7 

6.25 

.75 

.56 

.09 


Ns = 250 

Nh * 250 



X* = 27.75 


From Table 15t2: 

For 11 d.f., P = .01 for x* ^ 24.72 
For 9 d.f, P = .001 for x* ^ 27.88 


If, on the one hand, the sample is considered as being drawn from a universe 
of test scores made by college students in an introductory psychology course, 
d./. is equal to the number of class intervals (12) less 1; for at least one degree 
of freedom is lost by the constraint of AT,, the size of the sample. If, on the 
other hand, the sample is considered as being drawn from a restricted uni¬ 
verse with the mean and standard deviation equal to those of the sample 
result, two additional constraints are imposed by these parameters, and 
d./. 9. In the present problem, either interpretation leads to the same 

conclusion, even though the former interpretation is generally the one in¬ 
tended. 

Table 15:2 shows that the probabilities for 11 d./. are only .01 (1 in 100) 
that X* will be equal to or greater than 24.72, and only .001 (1 in 1000) that it 
will be equal to or greater than 31.26. According to the 1% confidence criterion, 
the hypothesis of a normal variate can be rejected; that is, the distribution of 
these test results cannot be considered as purely a random divergence from a 
normally distributed universe. This result, together with an inspection of 
Fig. 15:1, suggests either that the test itself was not properly designed to yield 



















FOR THE INDEPENDENCE OF TWO AHRIBUTES 437 

an adequate difTerentiation among both the brighter and the less informed 
students, or that the variation of students’ abilities may have been atypical; 
both factors can of course be present. However, the causes underlying the 
result cannot be answered on the basis of these statistical results alone. 

It should be noted that the chi-square Test of Significance for the distri¬ 
bution in Fig. 15:1 yields a different result from the centile analysis in terms 
of skewness and kurtosis developed in Chapter 13 at the end of Section D. 
Although these two properties are often a sufficient expression of the normality 
or non-normality of a distribution, they cannot always be relied upon to give 
adequate Tests of Significance for a distribution both as a whole and in 
detail. The divergence between the frequencies of Class Intervals 3 and 11 
(Table 15:5) and the hypothetical frequencies was not taken into account 
sufficiently by the centile measures of skewness and kurtosis. Hence, when¬ 
ever the distribution of a large number of cases is erratic at points that do 
not materially affect Coo, C76, C50, C26, and Cio (the centile values employed 
for measuring skewness and kurtosis), a Test of Significance for the hypothesis 
of a normally distributed variate can be more accurately set up in terms of 
chi-square. 

B. CHI^QUARE TESTS OF SIGNIFICANCE FOR THE 
INDEPENDENCE OF TWO AHRIBUTES 

Chi-square was employed in Tables 15:1, 15:3, and 15:5 to test hypotheses 
concerning the distribution of a single attribute or variable. The technique 
is also useful in testing hypotheses about co-relationships between two attri¬ 
butes or variables. These chi-square tests, which are usually referred to as 
tests of independence between two sets of sample data, are in effect Tests of 
Significance for the null hypothesis of no correlation between cross-tabulated 
attributes or bi-variates. A chi-square analysis will indicate whether the 
correlation is any greater than would be expected on the basis of chance. 

Chi-Square Test of Significance for Correlation Between 
Dichotomized Attributes 

Random samples of 100 men and 100 women are interviewed concerning 
their habits of listening to a radio program. Sixty-five of the men say they 
are non-listeners and 35 say they are listeners to the program. Among the 
women, 90 are non-listeners and 10 are listeners. The results are cross-tabu¬ 
lated in Table 15:6. 

This table indicates that there is a tendency for a greater proportion of 
listeners to be found among men than among women. Is this a chance rela¬ 
tionship, or is there a real tendency toward correlation of sex and listening 
habits? In other words, is the correlation in this sample result significantly 
greater than zero? 



438 


CHI-SQUARE AND TESTS OF SIGNIFICANCE 


Table 15:6. Cross-Tabulation of Listening Habits Toward a Radio Program 
with the Sex of the Respondents 


Sex Groups 



Men 

Women 

Non-Listeners 

a 

65 

b 

90 


c 

d 

Listeners 

35 

10 

no 

100 

100 


(a + c) 

(b+dl 


nr 

155 (a+ b) 


45 (c + d) 


Na — 200 (a -f- fc 4" c -f- d) 


The first step in a chi-square Test of Significance for independence between 
attributes consists in determining the hypothetical distribution of frequencies 
to be expected on the basis of chance alone. For this, the marginal totals 
of the dichotomies of each attribute, as well as Ns, must be taken into account. 
Table 15:6 indicates that 155 members of the total sample were non-listeners 
and only 45 were listeners. If this division of listeners and non-listeners is 
taken into account, as well as the equal division of the whole group on the 
basis of sex, the following three constraints are imposed in setting up the 
theoretical frequencies of the hypothesis to be tested: (1) N,, the size of the 
total sample; (2) the equal division on the basis of sex; and (3) the propor¬ 
tionate division of non-listeners and listeners. These three constraints on the 
hypothetical frequencies mean the loss of three degrees of freedom. Since 
there are only four categories of frequencies in the table, only one degree of 
freedom remains. 

In order to establish a hypothetical distribution of frequencies for the four 
cells which will be divided on the basis of chance expectancy, the following 
computations are necessary for any one cell of the two by two cross-tabulation 
(although this discussion is based on cell a, any other cell could have been 
used): 

1 . The probability of non-listeners is determined. This is based upon the 
data of the sample result and is equal to the ratio of non-listeners to the 
total sample, viz., 155/200, or 31/40. 

2 . The probability of men is determined. This is the ratio of men to the 
total sample, viz., 100/200, or 1/2. 

3. The probability of subjects who are both men and non-listeners is then 
determined. This is equal to the prodiici of the probability of men and the 
probability of non-listeners, or 

(155/200)(100/200) = .3875 




FOR THE INDEPENDENCE OF TWO AHRIBUTES 


439 


because the probabflity of the joint occurrence of two independent events 
(assumed for the hypothesis) is equal to the product of their respective 
probabilities. This, then, is the probability, under the conditions of the 
sample result, of obtaining male non-listeners on the basis of chance. The 
probability value of .3875 provides an estimate of the result to be expected 
for cell a if the attributes are in fact independent. 

Since chi-square is a technique for testing hypotheses dhont frequencies^ the 
probability value of .3875 must be converted to the number of frequencies to 
be expected for a sampling distribution in which Ns = 200. Hence the 
hypothetical frequencies for any cell are equal to the product of the probability 
value for the cell and iV, (the size of the sample). For cell a this is 77.5 (the 
product of .3875 and 200). This value is therefore the hypothetical frequencies 
of male non-listeners to be expected on the basis of chance alone for samples of 
200 cases, 155 of whom are non-listeners and 100 of whom are men. 

The calculation of the hypothetical frequencies for any cell of a cross¬ 
tabulation of the data of two attributes may be summarized as follows: 


//i = 



[15:4] 

Hypothetical frequen¬ 
cies for any cell of 
cross-tabulated attri¬ 
butes (based on prod¬ 
uct theorem of the 
probability of the joint 
occurrence of inde¬ 
pendent events) 


where Ur is the number of cases in the row that intersects the cell; fic is the 
number of cases in the column that intersects the cell; and Ns is the size of 
the sample. Thus for cell a in Table 15:6: 


/A(a) 



(155)(100) 
200 


77.5 


Once the number of hypothetical frequencies for any cell of a fourfold table 
is determined, the frequencies for the remaining three cells are strictly de¬ 
termined, for there is only one degree of freedom. The hypothetical frequencies 
of cell c are equal to (a + c) — a, i.e., 100 — 77.5 = 22.5; those of cell 6 are 
equal to (a + 6) — he., 155 — 77.5 = 77.5 and those of cell d are equal to 
(c + ei) — d, or (6 + d) — d. The hypothetical distribution of frequencies to 
be expected on the basis of chance for the data in Table 15:6 are summarized 
in Table 15:7. 

Having established the hypothetical frequencies to be expected for each 
cell on the basis of chance, we can now proceed with a chi-square Test of 
Significance. Chi-square is calculated exactly as before. The computations 
are given in Table 15:8, where the value of chi-square is seen to be 17.92. 
Table 15:2 indicates that when there is one degree of freedom the probabilities 
are only 1 in 1000 of obtaining, on the basis of chance alone, a chi-square 



440 


CHI-SQUARE AND TESTS OF SIGNIFICANCE 


Table 15:7. The Hypothetical Distribution of Frequencies for a Chi-Square 
Test of Independence of the Cross-Tabulated Data in Table 15:6 

Sex Groups 



Man 

Woman 

Non-Ustaners 

a 

77.5 

b 

77.5 

Listanan 

c 

22.5 

d 

22.5 

He 

100 

100 


Table 15:8. Computation of Chi-Square for the Test of Independence 
(Hypothetical Data from Table 15:7; Sample Data from Table 15:6) 


Call 

Fraquandas 
from Sampla 

fs 

Hypothetical 
Fraquandas for 
the Test of 
Indapandanca 

fh 

Differences 

(f. - h) 

Diffarancas 

Squarad 

(f« - hV 

Chi-Squara Ratio 

(f. - h)Vh 

o 

65 

77,5 

12.5 

156.25 

156.25/77.5 = 2.016 

b 

90 

77.5 

12.5 

156.25 

156.25/77.5 = 2.016 

c 

35 

22.5 

12.5 

156.25 

156.25/22.5 = 6.944 

d 

10 

22.5 

12.5 

156.25 

156.25/22.5 = 6.944 
X* = 2 = 17.92 


For 1 d.f.: P = .001 for x* ^ 10.83 


value equal to or greater than 10.82. Since the chi-square value obtained is 
considerably larger than 10.82, we can reject the null hypothesis. In other 
words, we can be quite confident that at least some of the correlation in the 
sample result is not fortuitous but is based on other than chance factors. 
The correlation, although apparently not large, is nevertheless significantly 
greater than zero. 

Pearson’s Short-Cut Computation of for 2 by 2 Cross-Tabulations * 

The preceding Test of Significance for the cross-tabulated data of dichoto¬ 
mized attributes can be quickly computed by a short-cut method that is 
algebraically equivalent but eliminates the separate computation of the 
hypothetical frequencies. This was developed by Karl Pearson and is ob¬ 
tained by the following: 


K. Pearson, op. p. xxxiv. 











FOR THE INDEPENDENCE OF TWO AHRIBUTES 


441 


N.iad - bey 

{a + b){c + d){b + d)(o + c) 


[15:5] 
X* for Test of Inde¬ 
pendence of 2 by 2 
cross-tabulations 


For the data in Table 15:8, by this short-cut method is as follows: 

200[(65)(10) - (90)(35)p _ 1,250,000,000 

^ (65 + 35) (90 + 10) (65 + 90) (35 + 10) 69,750,000 


Chi-Square Test of Significance for Correlation Between 
Attributes with More Than Two Categories 

Table 4:12 presented a 2 by 4 cross-tabulation between income status of 
2026 respondents and their opinion on private vs. government management 
^ of business. These data are given in Table 15:9, together with the hypothetical 
frequencies in brackets. 

Table 15:9. Sample and Hypothetical Frequencies for a Chi-Square Test of 
Independence of Respondents* Income Status and Their Opinions on Private vs. 
Government Management of Business 

Income Status 



Low 

Lower 

Middle 

Upper 

Middle 

High 

"r 


a 

b 

c 

d 


Private 

manogoment 

230 

[291] 

660 

[665] 

570 

[531] 

225 

[198] 

1685(o + l> + c + d) 


e 

f 

9 

h 


Government 

management 

120 

[59] 

140 

[135] 

68 

[107] 

13 

[40] 

341 (• + f + 0 + M 

He 

350 
(o + e) 

800 
Ib-f f) 

638 
(c + g) 

238 
(d + M 

N. = 2026 (o + bH- ho + W 


Again as in the preceding problem, the marginal totals for each row and 
column are taken into account in determining the hypothetical frequencies 
for a purely chance relationship. The degrees of freedom for any test of 
independence are given by the following: 

[15:6] 
Number of degrees of 
d./. = (Ae — l)(^c 1) freedom for a Test of 

Independence of cross- 
tabulated data 

where Ae represents the number of classes or categories for one attribute, 
and Be the number of classes or categories for the other attribute. For the 
2 by 4 cross-tabulation in Table 15:9, d,f. equals (1)(3) = 3. 




442 


CHI-SQUARE AND TESTS OF SIGNIFICANCE 


Since there are only 3 d.f.'s for these data, it is necessary to calculate the 
hypothetical frequencies of only three cells (in either row); the others are 
obtained by subtraction. Thus, /» for cell a is: 


By subtraction. 


1685(350) 

2026 


= 291.1 or (291) 


1685(800) 

“ ~ Y626 " ~ 

1685(638) 
- 2026 


= 1685 - (291 + 665 + 531) = 198 
= 350 - 291 = 59 
/a, = 800 - 665 = 135 
/„^ = 638 - 531 = 107 
= 238 - 198 = 40 


Table 15:10. Computation of Chi-Square for Test of Independence 
(Data from Table 15:19) 


Cell 

Sample 

Frequencies 

fa 

Hypothetical 

Frequencies 

h 

Diff.r.nc.s 

(f. - 

Differences 

Squared 

(fa ~ f;,)® 

Chi-Square 

Ratio 

(f. - fh)Vfh 

o 

230 

291 

61 

3721 

12.79 

b 

660 

665 

5 

25 

.04 

c 

570 

531 

39 

1521 

2.86 

d 

225 

198 

27 

729 

3.68 

e 

120 

59 

61 

3721 

63.07 

f 

140 

135 

5 

25 

18.52 

g 

68 

107 

39 

1521 

14.21 

h 

13 

40 

27 

729 

18.23 


Ns = 2026 

2026 



X* = 133.40 


From Table 15t2> 

When d.f. = 3, P = .001 for x® ^ 16.27 


The computation of chi-square for these data is shown in Table 15:10, and 
chinsquare is found to be 133.4. For 3 d./., the probabilities are only 1 in 1000 
for chi-squares equal to or greater than 16.27. Hence, we can reject the null 
hypothesis, i.e., that income status and the respondents’ opinions are inde¬ 
pendent of each other. In other words, there is some correlation between 
these two variables. 




















FOR THE INDEPENDENCE OF TWO AHRIBUTES 


443 


Contingency Coefficient 

The Contingency Coefficient, C, was presented in Chapter 4 as a measure 
for the correlation of poly tomous attributes, and its limitations were indicated, 
especially when the number of cells is small (see Table 4:15). C can be com¬ 
puted by the method used in Table 4:13, or by chi-square. In the latter case, 
it is equal to 

^_ [15:7] 

C=vxV(^a+X*) Contingency Coeffi¬ 

cient from chi-square 

For the data in Table 15:8, C is equal to .29: 

C =\/l7.92/(200 + 17.92) =V.0822 = .29 
and for the data in Table 15:10, to .25: 

C =Vl33.4/(2026 -f 133.4) =V.0624 == .25 

The latter is approximately the same as the value obtained in Table 4:13. 

As was stated in Chapter 4, C was developed by Pearson for categorical 
distributions on the assumption that each attribute is a continuously dis¬ 
tributed variable, that the distributions are similar in form to the normal 
bell-shaped curve, and that a linear function is adequate or most suitable to 
describe the correlations. Obviously not all of these assumptions are always 
satisfied. Four decades ago there was more emphasis than there is today on 
attempts to summarize correlation in terms of a measure analogous or equiv¬ 
alent to product-moment r. Unfortunately, with the value of C limited by 
the number of cross-tabulated categories, the measure is somewhat ambiguous 
in its implications concerning correlation. 

A test of the null hypothesis for C, the contingency coefficient, is obviously 
unnecessary when C is derived from chi-square. As a matter of fact, when 
the chi-square Test of Significance does not warrant the rejection of the null 
hypothesis, there is no point in computing C. 

Relation Between and <l> 

Karl Pearson * pointed out the following fundamental relation between 

and <l>: 

[15:8] 

X* = Equivalence of x* and 

or _ ^ [15:9] 

<j> = ^X^/Na Equivalence of and 

- X* 


* Karl Pearson, op. cit, p. xxxv. 



444 


CHI-SQUARE AND TESTS OF SIGNIFICANCE 


EXERCISES 

1. Describe the kinds of hypotheses for which chi-square tests of significance are 
relevant. 

2. Set up a relevant hypothesis for the sample data in Table 2:4 and test it by chi- 
square. 

3. Set up a relevant hypothesis for the frequencies of males in Table 3:6 and test it 
by chi-square. 

4. Determine by chi-square whether either one of the distributions in Table 6:7 
diverges significantly from the normal, bell-shaped distribution. 

5. Use chi-square to determine the degree of correlation in terms of the Contingency 
Coefficient for the following data:* 

“ If a man was paid $50.00 a week for 48 hours’ work in wartime and he is now * 
working only 40 hours a week, should he still be paid $50.00?” 


Socio-Economic Groups 


Respondents* Replies 

D 

C 

B 

A 

Yes 

260 

450 

285 

145 

No 

195 

470 

420 

325 


6. Use the conversion formula (15:9) to obtain the phi coefficient for the data in 
Exercise 5* 


* Adapted from H. C. Link, “The Psychological Corporation’s Index of Public Opinion,” 
Journal of Applied Psychology, 30:1-9, 1946. 




CHAPTER 16 


The Predictive Meaning of Correlation 


The correlation coefficient is an index which summarizes the degree of associa¬ 
tion or co-variation characteristic of the relationship between two attributes. 
Several methods for measuring the degree of such correlation were presented 
in Chapters 4, 9, and 10. We saw that many of the correlation problems in 
psychology and related fields can usually be dealt with by Pearson’s product- 
moment method of linear correlation, or by methods that give an estimate of 
Pearson’s r. However, this earlier treatment was presented primarily from 
the point of view of descriptive statistics. Hence we shall now proceed to 
amplify the implications of correlation for problems in sampling and analytical 
statistics. 

One of the most practical ways of interpreting correlation coefficients is in 
terms of their predictive meaning.* Although a correlation coefficient is not 
necessarily obtained for the purpose of predicting values of one variable 
from given values of the other, the meaning of predictive estimates is basic to an 
evaluation of the practical usefulness of correlation. Only thus can we evaluate 
the practical meaning of varying degrees of r, such as .40, .60, .80, .90, etc. 

We saw earlier that the product-moment correlation coefficient is the 
slope of a straight line that best fits a cross-tabulated set of correlational fre- 


Fig. 16:1. Scatter of Correlation Fig. 16:2. Scatter of Correlation 

Frequencies for = .00. Scatter at Frequencies for r^y = .60 

a Maximum 



-So- -2a -la la 2a 3a -3a -2a -la la 2a 3a 


* Cf. J. G. Peatman, “On the Predictive Meaning of Correlation,’* Journal of General 
Psychology, 22:17-23,1940. 


445 






446 THE PREDICTIVE MEANING OF CORRELATION 

quencieq^ whose respective measures have been equalized in terms of z scores. 
We saw, further, that all bi-variates have two regression lines, the regression 
of y on X and the regression of x on y, and that the value of product-moment r 
is the geometric mean of the regression coefficients for each best-fitting straight 
line function. From the point of view of sampling, the scatter of the correla¬ 
tional frequencies about the regression lines gives a basis for determining the 
Qjccuracy with which values of one variable can be predicted from those of 
another. For example, as the correlation coefficient approaches zero, the 
scatter of the correlational frequencies approaches a maximum, as indicated 
in Fig. 16:1. On the other hand, as the correlation coefficient approaches 
unity (either 1.0 or —1.0), the scatter of the correlational frequencies about 
the regression line approaches a minimum. In the case of perfect correlation, 
there is no scatter whatsoever. (See Figs. 16:2, 16:3, and 16:4.) The scatter of 


Fig. 16:3. Scatter for = .60 
as Measured pe r Class I nterval In 
Terms of — r^xy), and with 

Equal Variability for Each Class In¬ 
terval Assumed 
3or 

2<r 

lor 


My 
-Icy 
-2or 

3<y 

-3cy -2cy -Icr Mx 1^ 2<y Za- 



Fig. 16:4. Perfect Positive Corre¬ 
lation, txy = 1.0. No Scatter: crest = 
zero 



-3(y -2o' -Icr Mx la 2cr 3or 


normal correlation surfaces forms an ellipse which becomes narrower and 
approaches a straight line as the correlation increases, and which widens and 
approaches a circle as the correlation decreases toward zero. 

Two problems arise in interpreting the predictive meaning of a correlation 
coefficient. The first involves the mathematical procedure for making a pre¬ 
diction. The second concerns the accuracy or efficiency of the prediction 
made. The making of a prediction is based mathematically on the regression 
equation of the line that best fits the co-variation of the data. The accuracy 
or efficiency of the prediction is determined in terms of the standard error of 
estimate. This is based on the standard deviation of the scatter of correlational 
frequencies about the best-fitting straight line, and is an estimate of the 
standard deviation of the sampling distribution of predicted values. For a 
correlation between variables x and y, there are two standard errors of esti¬ 
mate: one for the regression of y on x, and the other for the regression of Sb on y. 


/AAWNG THE PREDICTION 


447 


A. MAKING THE PREDICTION 


The prediction of values of one variable from given values of the other is 
based on the correlation between the sample results. When the correlation 
is linear, as it is assumed to be for product-moment correlation, predictions can 
be made by means either of regression (straight-line) equations or of a straight 
line fitted to the graphic distribution of variations of x with respect to y (or 
y with respect to x). We saw in Chapter 9, Section B, that the regression equa¬ 
tions in deviation score form arc as follows; 

D6:ll 


(y onx): 


y = 


i 

r — Xi OT y = rbyxX 

O'x 


Regression of y on x in 
product-moment cor¬ 
relation 


(x on y): 


X = 


r — y, or X = r6*„y 

<Ty 


P6:2] 

Regression of x on yin 
product-moment cor¬ 
relation 


in which the regression coefficients bxy and hyx are respectively 



and 



In practice, however, it is usually more convenient to estimate one variable 
from given values of the other in terms of original scores rather than in terms 
of X and y. The regression equations in original score form are as follows: 


(yonj^): 

1 

li 

(Tx 

(XonT): 

X-=r-Y - 

(Ty 


(Ty 


+ M, 


-\-Mx 


D6:3] 

Regression of on X 

□ 6:4] 

Regression of on y 


The use of these two equations in making predictive estimates will be illus¬ 
trated with the following data: The correlation between intelligence test 
scores (x variable) and grades (y variable) was found to be .50 for a college 
sample of 200 subjects. The means and standard deviations of each variable 
were approximately: 

Intelligence Test Scores (x) Grades (y) 

Mean = 80.0 Mean = 75.0% 

<Tx = 15.0 <Ty = 8.0% 


Assuming that the relationship is linear and that these results are derived 
from a random sample of a normally distributed universe, we can estimate 
the average intelligence test score for a given grade score or, conversely, the 
average grade score for a given test score. The latter is generally more often 
required because scholastic achievement (as measured by grades) is usually 
considered to be a function of intelligence. The interrelation of causal factors, 
however, is complex. 



448 THE PREDICTIVE MEANING OF CORRELATION 

The regression equation for Y on X gives the following result for these 
data: 

= .267 X - 21.360 + 75.0 
= .267 X + 53.640 

This equation is the basis for estimating Y (grade scores) from given values 
of X (intelligence test scores) when ,50 is the correlation for the sample data. 
In order to check the summary of the equation, it is well to predict the 
average intelligence test score for a grade score equal to the mean value of 
75.0% (or a grade of C). The average value of X will be equal to the mean 
intelligence test score, because the two regression lines of a product-moment 
correlation matrix intersect at coordinates projected from the arithmetical 
means of the respective distributions. Although an actual prediction from the 
mean of 80.0 is thus unnecessary, the computation is relevant, since it checks 
the values obtained in the equation. Thus: 

F= .267(80.0) + 53.640 = 21.360 + 53.640 = 75.0% 

The average grade score of students with intelligence test scores of 95.0, 
one standard deviation above the mean, 

M* + l<rx - 80.0 + 15.0 = 95.0 

is predicted as follows: 

F = .267(95.0) -f 53.64 = 25.365 + 53.640 = 79.0% 

This value indicates that students with intelligence test scores of 95 will, on 
the average^ have a grade score of 79%, or C+. 

The average grade score for students with intelligence test scores of 110.0, 
two standard deviations above the mean, will be equal to the following: 

F = .267(110.0) + 53.640 = 29.370 + 53.640 = 83.0% 

Students with intelligence test scores of 110 will have an average grade score 
of approximately 83%, or slightly less than B. Students with intelligence test 
scores of 50.0, two standard deviations bebw the mean, will have, on the 
average, the following grade scores: 

F= .267(50.0) + 53.64 = 13.350 + 53.640 = 67.0% 

Such students will, on the average, be expected to have grade scores of 
67.0%, or slightly better than D. 

Such predictions are the best estimates possible with the sample data 
available. Each prediction, it should be emphasized, is an average estimate. 
Only if the correlation between two variables is perfect will all the values of 
Y be the same for a given value of X. As the correlation approaches zero. 



MAKING THE PREDICTION 


449 


the average estimate for Y is less and less accurate, because the scatter 
about the average value increases. The greater the scatter, the less reliable 
the prediction; and, conversely, the less the scatter, the more accurate or 
reliable the prediction. When the correlation is zero, the scatter for estimates 
of Y from any value of is at a maximum and is equal to the range of 
deviation of the y variable itself. Before describing a method for computing 
the error of estimation for a given value of X when the correlation is not zero, 
we shall illustrate graphically the predictive estimates which have been made. 


Predictions on a Correlation Matrix 

Two types of graphs can be used to show the predictive relationship be¬ 
tween two variables. In one type, the geometric field of coordinate axes is 
used, as in Fig. 16:5. The x and y variables arc scaled on the abscissa and 
ordinate, respectively, in such a way that the scale distances are equalized in 
terms of standard deviation units. In other words, the scales are the same as 
those used in Fig. 9:10 for a z score cross-tabulation chart. The scales for 
both z scores and original scores are given for each variable. 


Fig. 16:5. Predicted Values of / from 
Given Values of When r^y = .50 

3*0 99% 

2.0 91% 

1.0 83% 

-§79% 

M 0 075% 

O 

-1.0 67% 

-2.0 59% 

-3.0 51% 

35 50 65 80 95 110 125 



Intelligence Test Scores 
-3.0-2.0-1.0 0 1.0 2.0 3.0 

M 


Fig. 16:6. Predicted Values of y from 
Given Values of x. When = .50 


M 

z-3.0-2.0 -1.0 0 1.0 2.0 3.0 

51% 59% 67% 75% 83% 91% 99% 




35 50 65 80 95 110 125 
z-3.0-2.0-1.0 0 1.0 2.0 3.C 
M 


/Variable 

(Grades) 


X Variable 
(Intelligence 
Test Scores) 


As already indicated, the predicted estimate of Y from the mean value of x 
is the mean of y, because the regression line intersects the coordinate axes at 
the means. Also, a grade score one-half a standard deviation above the mean 
grade score is the value predicted from an intelligence test score one standard 
deviation above the mean intelligence test score. The predicted value with 
the regression equation is 79.0%, and this value is also obtained by the graphic 
method in Fig. 16:5. The other predictions computed above are also shown. 

In the second type of graph, two parallel horizontal scales, one for each 
variable, are used, as shown in Fig. 16:6. Instead of the two scales being laid 
off as the two axes of a geometric field, they are laid off parallel to each other. 
The deviation distances on each scale are drawn in z score units that arc 







450 


THE PREDICTIVE MEANING OF CORRELATION 


equal to each other, with the mean of the y variable directly opposite the 
mean of the x variable. A point one standard deviation above the mean of the 
y variable has the same relative position on the scale as a point one standard 
deviation above the mean of the x variable, etc. The predictions computed 
above are shown in this figure. 

When r is .50, the z score value of one variable predicted from the z score 
value of the other will be exactly half as great (except at the mean). This fol¬ 
lows from the fact that in z score form the regression equation of Zy on z* is 
Zy = .SOz*. When r is .50, the value of y predicted from any given value of x 
is exactly half the latter’s standard deviation distance from the mean of y. 

Fig. 16:7. Values of y Predicted from Several Values of x When r-cy = .00, .25, 

.50, .75, and 1.0 

^Scor«' 35. 40. 42.5 45. 47.5 50. 52.5 55. 57.5 60.61.25 65 

I___J_I-1_I_I_i__U-J_—I 



The prediction of measures of one variable from another variable that is 
correlated with it is further illustrated in Fig. 16:7. Again, the relationship 
between two variables is shown by parallel scales drawn so that the corre¬ 
sponding z score values are directly opposite each other. The predictive esti¬ 
mates made from z* scores of —2.0, +1.0, and +3.0 are drawn for different 
degrees of correlation, namely, r = .00, .25, .50, .75, and 1.0. It is apparent 
that when r is zero, regardless of the z score value from which a prediction is 
made, the average estimated value on the y variable is the mean of y (a Zy score 
of zero). On the other hand, when the correlation is positive and perfect 
(r = 1.0), the predicted Zy values are identical with the z® values from which 
the predictions are made. As the correlation decreases from 1.0 to .00, how¬ 
ever, the predicted values of Zy “regress” toward the mean value of y, i.e., 
Zy = 0. 

In negative correlation, prediction is logically similar to that in positive 
correlation except that the positive deviations of one variable tend to be 
associated with the negative deviations of the other. In other words, when 
there is a perfect negative correlation, z* values of 3.0 are associated with 





THE ACCURACY OR EFFICIENCY OF PREDICTIONS 


451 


Zy values of --3.0, Zx values of —2.0 are associated with Zy values of -|-2.0, 
etc. As negative correlations approach zero, the regression is similar to that 
in positive correlation, since the predicted values for one variable progres¬ 
sively approach the mean value of that variable. 


B. THE ACCURACY OR EFFICIENCY OF PREDICTIONS 

Only a few of the predictive implications of product-moment correlation 
were illustrated in the preceding section. These are the initial aspects of the 
total situation. The research worker must also be aware of the accuracy or 
efficiency of predictions for varying degrees of correlation. Predictions can 
always be made from either of two correlated variables, regardless of the 
degree of correlation between them. As we have seen, when the correlation 
coefficient itself is zero, a predictive estimate can be made even though it is no 
more informative than a guess. When r is zero, all predictive estimates for 
one variable will be the mean of that variable, regardless of the values of the 
other variable. However, implications of the accuracy or efficiency of pre¬ 
dictive estimates vary tremendously for different degrees of correlation, the 
efficiency being zero when r is zero. 


The Standard Error of Estimate 


We have already pointed out that the scatter of correlational frequencies 
about either regression line furnishes a graphic picture of the efficiency with 
which a predictive estimate can be made. When the correlation is zero, the 
scatter is at a maximum. In the case of bi-variates which are distributed 
normally, the correlational frequencies are concentrated near the center of 
the bi-variate distribution and the scatter is distributed circularly from that 
center. As the degree of correlation increases, negatively or positively, the 
scatter decreases gradually and forms an elliptic pattern about the regression 
line. 

What is next needed is an algebraic method for expressing the degree of 
scatter characteristic of different product-moment correlation coefficients. 
The scatter is measured in terms of the standard deviation, and is the stand¬ 
ard error of estimate already referred to.* This estimate of scatter about the 
best-fitting line is readily obtained from the following formulas: 

_ [16:5] 

eaty = 1 Standard error of esti¬ 

mate of y on x 




[16:6] 
Standard error of esti¬ 
mate of on y 


where ay and are the standard deviations of the distributions of the vari¬ 
ables correlated, and r^y is the correlation between them. 


The P.E. of estimate is sometimes used; it is equal to .6745 



452 


THE PREDICTIVE MEANING OF CORRELATION 


The measure of scatter is thus a function of the degree of co-variability 
between two variables. The expression under the radical is the basic measure 
of the degree of scatter between two variable s and serves to reduce or <Ty 
accordingly. T. L. Kelley * called Vl — the coefficient of alienation and 
symbolized it by k. This coefficient gives the ratio of the variability of the 
measures in any class interval of ac or y to the variability of y or x as a whole.f 
Thus, the estimated scatter of any class interval for the y variable is: 


k = 



= Vl — Txy® 


[16:7] 

Coefficient of aliena¬ 
tion, k 


An inspection of this formula shows that when r is equal to 1, fe is equal to 
zero, and that consequently the values of the standard error of estimate in 
Formulas 16:5 and 16:6 also are zero. This is the situation in perfect corre¬ 
lation, since there is no scatter about the regression line. On the other hand, 
when r is zero, k is equal to 1.0, and the value of the standard error of estimate 
is identical with the measure of variability of the distribution. In other words, 
when r is zero, the error of estimate is at a maximum and is equal to the 
standard deviation of the variable itself. 

Between r values of 1.0 or —1.0 and zero, the scatter of correlational fre¬ 
quencies about the regression line decreases gradually as the correlation coeffi¬ 
cient increases from zero. The way in which the degree of scatter decreases 
must be clearly understood because it is the basis for interpreting the pre¬ 
dictive efficiency or accuracy of correlation coefficients. The relationship be¬ 
tween the estimate of error, fe, and varying degrees of correlation is given 
in Table 16:1. 

As already indicated, if r is zero, the error of estimate is at the maximum, 
and is equal to the standard deviation of the variable for which predictions 
are being made. Thus, when r is zero, any predictions of y from x or of x 
from y are no better than a guess. Regardless of the value of x, the predicted 
value of y will be the mean of the y distribution, and the scatter of the y 
variable above and below the regression line (whose slope is zero) will be 
equal to the scatter of the y variable as a whole. In other words, when r 
equals zero, Cetty is equal to <ry. If r is .10, we see from Table 16:1 that fe is 
.995. Thus, an increase from zero to .10 in the degree of correlation decreases 
the error of estimate by only .005 of its maximum value when r is zero. Hence 
a correlation coefficient of .10 is likewise little better than a guess. When r is 
.30, the predictive efficiency is only 5% better than a guess because fe is .95. 
When r is .50, fe is .866, and the predictive efficiency is about 13%. Table 16:1 

*T. L. Kelley, “Principles Underlying the Classification of Men,” Journal of Applied 
Peyehohgy^ 3:50^7, 1919. 

t The interpretation of the standard error of estimate is based on the assumption of 
a normal correlation surface, i.e., that each variable is normally distributed, and of homo- 
seedastieiiy, i.e., that the scatter or variability of all arrays (or class intervals) of a variable 
is the same. 



THE ACCURACY OR EFFICIENCY OF PREDICTIONS 453 

Table 16:1 Values of k, the Coefficient of Alienation (Vl — r^) for Values 
of r from Zero to Plus or Minus 1.00 * 


r 

k 

r 

k 

r 

k 

r 

k 

.00 

1.000 

.25 

.968 

.50 

.866 

.75 

.661 

.01 

.999+ 

.26 

.966 

.51 

.860 

.76 

.650 

.02 

.999+ 

.27 

.963 

.52 

.854 

.77 

.638 

.03 

.999+ 

.28 

.960 

.53 

.848 

.78 

.626 

.04 

.999 

.29 

.957 

.54 

.842 

.79 

.613 

.05 

.999 

.30 

.954 

.55 

.835 

.80 

.600 

.06 

.998 

.31 

.951 

.56 

.828 

.81 

.586 

.07 

.998 

.32 

.947 

.57 

.822 

.82 

.572 

.08 

.997 

.33 

.944 

.58 

.815 

.83 

.558 

.09 

.996 

.34 

.940 

.59 

.807 

.84 

.543 

.10 

.995 

.35 

.937 

.60 

.800 

.85 

.527 

.11 

.994 

.36 

.933 

.61 

.792 

.86 

.510 

.12 

.993 

.37 

.929 

.62 

.785 

.87 

.493 

.13 

.992 

.38 

.925 

.63 

.777 

.88 

.475 

.14 

.990 

.39 

.921 

.64 

.768 

.89 

.456 

.15 

.989 

.40 

.917 

.65 

.760 

.90 

.436 

.16 

.987 

.41 

.912 

.66 

.751 

.91 

.415 

.17 

.985 

.42 

.908 

.67 

.742 

.92 

.392 

.18 

.984 

.43 

.903 

.68 

.733 

.93 

.368 

.19 

.982 

.44 

.898 

.69 

.724 

.94 

.341 

.20 

.980 

.45 

.893 

.70 

.714 

.95 

.312 

.21 

.978 

.46 

.888 

.71 

.704 

.96 

.280 

.22 

.976 

.47 

.883 

.72 

.694 

.97 

.243 

.23 

.973 

.48 

.877 

.73 

.683 

.98 

.199 

.24 

.971 

.49 

.872 

.74 

.673 

.99 

.141 


shows that a correlation coefficient must be .87 for its predictive efficiency 
to be 50% better than a guess. Even when r is .99, its predictive efficiency is 
still far from perfect, being 86% better than a guess. The accuracy of pre¬ 
diction thus increases very gradually as correlation coefficients diverge from 
zero, and then much more rapidly as they approach 1.00 or —1.00. 

The accuracy of predictive estimates based upon product-moment correla¬ 
tion can be concretely illustrated by means of the grade and intelligence 
test score data used for the predictive estimates in Figs. 16:5 and 16:6. The 
correlation coefficient for these data was .50, and the standard deviation of 
the grade scores was 8%. The standard error of the grade scores predicted from 
the test scores is thus: 

= (TyVl - rV = S.oVl - (.50)2 = 8.0(.866) = 6.93 (or 7%) 

Since the standard deviation of the y variable as a whole is 8% and the vari¬ 
ation of any predicted value of a grade score is approximately 7%, the pre¬ 
diction when r is .50 is somewhat better than a guess. The proportionate 

* From Table V, Appendix B, pp. 516-517. 



454 


THE PREDICTIVE MEANING OF CORRELATION 


increase in eflBciency of prediction, as against a sheer guess, is thus the differ¬ 
ence between an error of 8% (when r equals zero) and an error of about 7% 
(when r equals .50). The increase in efficiency of prediction is therefore 
(8% - 7%)/8% = 12.5%. 


The Interpretation of the Error of Estimate 

The estimate of Y was found to be an average grade of 79% when pre¬ 
dicted from an intelligence test score of 95.0. The standard error of this 
estimate is 7%. Let us now see how this error of estimate is interpreted in 
relation to these bi-variates. On the assumption that the bi-variates are 
normally distributed and that the variability of the grade scores in each 
array is approximately equal, we can infer that in the long run approximately 
68 out of 100 grade score predictions from a given intelligence test score will 
vary within the limits of 79% ± 7%, or between 72% and 86%. This is the 
range of plus and minus one standard error, which in the normal probability 
distribution includes approximately .68 of the area. 

In other words, on the basis of these sample results, students with intelli¬ 
gence test scores of 95.0 will, on the average, have grade scores of 79%. In 
the long run, approximately two-thirds may be expected to have grades 
between 72% (C“) and 86% (B). These are the limits for a sampling distribu¬ 
tion of predicted scores ±1.0<rca< from the grade score of 79%, predicted from 
an intelligence test score of 95.0. If we take into account the range of a nor¬ 
mally distributed sampling distribution—^two standard deviation units above 
and below the predicted grade score value—approximately 95 in 100 such 
predictions should in the long run lie within the limits of 79% ± 2(7%), i.e., 
from 65% to 93%, or, in terms of letter grades, from D to A. Finally, since a 
range of ±2.5 or 3.0 standard deviation units includes nearly 100% of the 
area of the normal probability distribution, practically all grade score values 
predicted from an intelligence test score of 95.0 will lie within the range 
of 79% ± 2.5(7%), or 79% ± 3(7%). By the 2.5(7 criterion these limits are 
61.5% and 96.5%; and by the 3<7 criterion they are 58% and 100% (100% 
being the upper limit of actual possibility). 

These estimated ranges of possible error (sampling variations) in prediction 
are similar to the confidence limits already described in Chapter 13 for Tests 
of Significance. In reality, what we have been doing is to set up Tests of 
Significance for a continuum of hypothetical parameter estimates, the results 
of which determine the range of acceptable or likely hypotheses. At the same 
time, the limits for unlikely hypotheses are also established. Thus, by the 
T ratio criterion of 2.5, we can reject with confidence the hypothesis that 
students with intelligence test scores of 95.0 will have grade scores less than 
61.5% (D"") or greater than 96.5% (A), these being the limits of the predicted 
grade score value ±2.5 times the standard error of the estimate. 

Thus, in predicting values of one variable from a given value of another, we 



THE ACCURACY OR EFFICIENCY OF PREDICTIONS 


455 


first estimate the range of most likely hypotheses in the light of the sample 
result, and we then indicate the limits for unlikely values. We saw in Chap¬ 
ter 12 that values beyond a range of ±2.5 or ±3.0 standard error units are 
unlikely in random samplirig, and that values within the range of ± 2.0 error 
units are likely. These criteria of 2.0 and 2.5 or 3.0 standard error units de¬ 
termine the range of questionable or doubtful hypotheses; that is, they can 
be neither accepted nor rejected with confidence. We have seen that the 
limits of likely hypotheses are determined by the variability of the sample 
result when the correlation is zero. On the other hand, when the correlation 
is .87, the limits for likely hypotheses are but half as great as they are when 
r is zero. This is what is meant by the statement that the efficiency of pre¬ 
dicted estimates is 50% better than a guess when r is .87. 


Graphic Representation of the Accuracy of Predictive Estimates 

In Fig. 16:8 the graphic technique employed in Figs. 16:6 and 16:7 has been 
used further to illustrate the error characteristic of predictive estimates when 
r is zero. All the predicted values for y are at the mean, regardless of the x 


Fig. 16:8. Error of Estimate for y Predicted from Various Values 
of X When txy = zero 


Kd.OO 


f ^ - Range of, z, ± a„,.- ' 

^lower Limit of Zy-3<Tr,» , Upper Limit of Zy Sffeu 



y Varioble 


X Variable 


score from which they are predicted. Furthermore, the range of error is 
assumed to be distributed normally and over limits similar to those of the y 
variable itself. In other words, when r is zero we cannot with confidence 
reject the hypothesis that the limits of the value of one variable which is 
predicted from a given value of the other will be any smaller than the limits 
of the entire distribution. Although, on the average and in the long run, we 
should expect that the most likely value of predicted scores would equal the 
mean of the distribution, there is less probability of this value coinciding with 
the mean than lying somewhere else in the distribution. The mean is only the 
modal point. This situation is analogous to the sampling distribution obtained 
from tossing 20 coins, for which a result other than 10 heads is more likely 
to occur than exactly 10 heads. In any event, when r is zero, any prediction 



456 


THE PREDICTIVE MEANING OF CORRELATION 


is a guess. Under the circumstances (i.e., a normal sampling distribution) 
the best guess is the mean of the sample distribution. 

Fig. 16:9 shows the range of variability or sampling error to be expected 
in predictions from the mean of x, for varying degrees of correlation. This 


Fig. 16:9. Error of Estimate for Values of y Predicted from the Mean Value of x 
When = .00, .25, .50, .75, .90, and .99 

(r=.99) 



{K = 1.00) 


Scale 


z Score 



o. 

O; 

•o 

r«i 

o 

i 

*o 

1_,_,_iij 

•* 


3<r 


- 2.0 

-2<r 


- 1.0 

-lo 


0 

Meon 


1.0 

IflT 


2.0 

2a 


X Variable 


3<y 


figure further illustrates the fact that the range of the error for these estimates 
decreases gradually as the correlation coefficient increases from zero. Since all 
the predicted values of y are made from the mean of the x variable, the y 
estimates are equal to the mean of the y variable. 

Fig. 16:10 illustrates the effect of correlation on the predicted estimates of 


Fig. 16:10. Error of Estimate for Values of y Predicted from a Value of x Equal to a 
z score of 2.0, When r*y = .00, .25, .50, .75, .90, and .99 






THE ACCURACY OR EFFICIENCY OF PREDICTIONS 


457 


y from x at a point two standard deviations above the mean of x, and the 
range of error characteristic of such estimates. Six different degrees of corre¬ 
lation are used, ranging from zero to a correlation of .99. The divergence of 
the average estimate from the mean of the y distribution as r increases is 
shown in relation to the range of expected sampling error. Again it is apparent 
that the accuracy of prediction increases gradually as the correlation increases 
from zero. The sampling error has a sizable range, even with a correlation of 
.75. Thus the range of likely hypotheses for the values of y estimated from z* 
equal to 2.0 is fairly great, the limits of the estimate ±2.0a'«,« being equal to 
Zy scores of 0.18 and 2.8. When the correlation is .99, the limits for unlikely 
hypotheses (by the T criterion of 3.0) are Zy scores of 1.6 and 2.4. In other 
words, y cannot be estimated from x without error even for such a high 
correlation. 

Fig. 16:11. Error of Estimate for Values of y (Grades) Predicted from Thorndike 
Intelligence Test Scores of 50 and 110, When rxy = .50 and .75 



The above predictive implications of correlation are brought together in 
Fig. 16:11 for the intelligence test scores and grades used earlier in this chap¬ 
ter. The grade scores are the y variable; they are scaled at the top of the figure 
in five broad classes of letter ratings from F to A, with the average expectancy 
of frequencies for each class given in percentages. The equivalent z score limits 
of each class are given on the Zy score scale in the middle of the figure. Thus, 
a grade of C includes a range on the y variable whose limits in terms of 
z scores are taken as -^0.67 and 4-0.67. The universe is assumed to be dis- 







458 


THE PREDICTIVE MEANING OF CORRELATION 


tributed normally. In such a distribution z score limits of —0.67 and +0.67 
mark off the range of the middle 50% of the frequencies. The range of B grades 
is from z score values of 0.67 to 1.65 and is taken so as to include 20% of the 
frequencies. The range of D grades includes the same proportion of frequencies 
as the B grade range, the z score limits being —0.67 and —1.65. The remain¬ 
ing 10% of the frequencies are evenly divided between the A and F grades. 
The A ratings lie beyond the z score limit of 1.65 and the F ratings lie below 
the z score limit of —1.65.* 

The predictive meaning of correlation is illustrated for coefficients of .50 
and .75. Estimated grades are predicted from intelligence test scores of 
50 {zx = —2.0), 80 (z» = zero), and 110 (z* = 2.0). The coefficient of .50 is the 
sample result and is typical of the correlations between intelligence test scores 
and college grades reported in the literature. The correlation of .75 is higher 
than any empirical value which has come to the writer’s attention. 

Fig. 16:11 illustrates both the predicted letter grades and their relation to 
their respective ranges of variation or error. Thus, the predicted estimate for 
students with an intelligence test score of 110 is a letter grade within the 
B range, regardless of whether the correlation is .50 or .75. However, if the 
correlation is only .50, the lower limit of unlikely resuhs is a D. In other words, 
we can reject as unlikely the hypothesis that individuals with intelligence 
test scores of 110 will have letter grades lower than D. The upper limit of 
unlikely results is the maximum rating, namely, an A. But the upper limit of 
likely hypotheses also lies within this same grade interval. Consequently, the 
upper limit really imposes no restriction on what is likely or unlikely for 
people with such an intelligence test score. 

If the correlation were .75 instead of .50, then, according to the figure, the 
lower limit for unlikely hypotheses would lie within the C interval, and conse¬ 
quently it would be most unlikely that students with intelligence test scores 
of 110 would have grade scores of less than C. In view of the sample results, 
i.e., r equals .50, the most plausible hypothesis is that most students with 
intelligence test scores of 110 will have scholastic averages ranging from C to A. 

When scores are predicted from values below the mean of x, it is the upper 
range for unlikely hypotheses, rather than the lower, which is of major con¬ 
cern. Thus, as indicated in Fig. 16:11, grade scores predicted from an intelli¬ 
gence test score of 50 lie in the D interval, and the upper limit for unlikely 
hypotheses is a grade of B when r equals .50, and a grade of C when r equals 
.75. In other words, it is unlikely that students with intelligence test scores of 
50 will have grade score averages above B when the correlation between the 
two is .50, and it is likewise not likely that their grade scores will be above C 
if the correlation is .75. 

Finally, when grade scores are estimated from near the mean intelligence 

* Such a division of letter grades for distributions assumed to be normal is often used 
in educational circles, but is not thereby to be condoned. There are no immutable laws 
operating in the scholastic natures of students which dictate that 5 % are to fail, etc. 



THE ACCURACY OR EFFICIENCY OF PREDICTIONS 459 

test score, both the upper and the lower limits for unlikely hypotheses are 
relevant in appraising the accuracy of the estimates. According to Fig. 16:11, 
for a correlation of either .50 or .75, these limits are in the F and A grade 
ranges for a T criterion of either 2.5 or 3.0. If a range of ±2.0 standard errors 
is taken as the range of likely hypotheses, then when r equals .50 the grade 
scores of students with intelligence test scores of 80, the mean value, wih 
most likely range from D to B. 


The Index of Predictive Efficiency (E) 

Since fe, the coefficient of alienation, measures the proportionate reduction 
in error or scatter for estimates of a variable predicted from given values of a 
second variable, the efficiency of prediction can be measured, or indexed. The 
index commonly used for this purpose, and symbolized as E by Clark Hull,* 
expresses as a percentage the proportionate reduction in the error of estimate 


Table 16:2. The Index of Predictive Efficiency, E, for Values of r from 
Zero to Plus or Minus 1.00 




£ = 

100 %(l 

-Vi 

“ r 2 ) 
•xy ; 



r 

E 

r 

£ 

r 

E 

r 

f 

.00 

. 0 . 0 % 

.25 

3 . 2 % 

.50 

13 . 4 % 

.75 

33 . 9 % 

.01 

0 . 1 - 

.26 

3.4 

.51 

14.0 

.76 

35.0 

.02 

0 . 1 - 

.27 

3.7 

.52 

14.6 

.77 

36.2 

.03 

0 . 1 “ 

.28 

4.0 

.53 

15.2 

.78 

37.4 

.04 

0.1 

.29 

4.3 

.54 

15.8 

.79 

38.7 

.05 

0.1 

.30 

4.6 

.55 

16.5 

.80 

40.0 

.06 

0.2 

.31 

4.9 

.56 

17.2 

.81 

41.4 

.07 

0.2 

.32 

5.3 

.57 

17.8 

.82 

42.8 

.08 

0.3 

.33 

5.6 

.58 

18.5 

.83 

44.2 

.09 

0.4 

.34 

6.0 

.59 

19.3 

.84 

45.7 

.10 

0.5 

.35 

6.3 

.60 

20.0 

.85 

.86 

.866 

.87 

.88 

.89 

47.3 
49.0 
60.0 
50.7 
52.5 

54.4 

.11 

.12 

0.6 

0.7 

.36 

.37 

6.7 

7.1 

.61 

.62 

20.8 

21.5 

.13 

0.8 

.38 

7.5 

.63 

22.3 

.14 

1.0 

.39 

7.9 

.64 

23.2 

.15 

1.1 

.40 

8.3 

.65 

24.0 

.90 

56.4 

.16 

1.3 

.41 

8.8 

.66 

24.9 

.91 

58.5 

.17 

1.5 

.42 

9.2 

.67 

25.8 

.92 

60.8 

.18 

1.6 

.43 

9.7 

.68 

26.7 

.93 

63.2 

.19 

1.8 

.44 

10.2 

.69 

27.6 

.94 

65.9 

.20 

2.0 

.45 

10.7 

.70 

28.6 

.95 

68.8 

.21 

2.2 

.46 

11.2 

.71 

29.6 

.96 

72.0 

.22 

2.4 

.47 

11.7 

.72 

30.6 

.97 

75.7 

.23 

2.7 

.48 

12.3 

.73 

31.7 

.98 

80.1 

.24 

2.9 

.49 

12.8 

.74 

32.7 

.99 

85.9 







1.00 

100.0 


C. Hull, Aptitude Testing, World Book Co., Yonkers, 1928. 



460 THE PREDICTIVE MEANING OF CORREUTION 


from the maximum error characteristic of zero correlation. It is given by the 
following: 

[16:8] 
Index of predictive effi¬ 
ciency, E 


E = 100%(1 - Vl - rV) 
= 100%(1 - k) 


The E values for varying degrees of correlation from zero to plus or minus 
1.00 are given in Table 16:2. Thus, a coefficient of .60 has a predictive efficiency 
20% better than a sheer guess. That is, when r equals .60, the range of likely 
hypotheses for any predictive estimate is 20% less than the range of such 
hypotheses when r equals zero. 

When r is .866, the predictive efficiency is 50%. In other words, a correla¬ 
tion must be .866 in order for the range of error to be only 50% as great as in 
the case of a guess, i.e., zero correlation. If r equals .95, the predictive efficiency 
is 69%. If r is 1.00, its efficiency of prediction is 100%, since there is no error or 
scatter for a perfect correlation. 


Standard Error of Estimate for the Mean [aeBtJi 

A group result in terms of its mean can be predicted more accurately than 
can a particular score because the standard deviation of a sampling distri¬ 
bution of means is considerably less than it is for a sampling distribution of 
particular scores. The latter measure is given by the standard error of esti¬ 
mate, and, as we have seen, is a function of the scatter in the bi-variate 
distribution. The standard error of estimate of the mean of y from the mean 
of X, on the other hand, is given by the following; 

[16:9] 

^ _ Standard error of esti- 

(TeatM ~ ^ Vl — rxy^ mate for the mean of 

V/v. -1 one variable predicted 

from the mean of a 
correlated variable 

w here trj VN, — 1 is the standard error of a mean (Formula 13:5a) and 
Vl — is k. 

When predicted from a correlated variable, the standard error of a mean is 
thus reduced by the value of k, or, from the point of view of the standard 
error of estim ate of ^ on x, the error in predicting a mean score is reduced 
by — 1. (Note that when iV« is large, this value can be taken simply 
as VW,.) 

A correlation coefiBcient of .50 is considerably more efficient for predicting 
a mean score than a correlation coefficient of .90 for predicting particular 
scores. When r equals .90, k is .44 (Table 16:1), and E is 56% (Table 16:2). 
In predicting a mean score, even if r is only .50 for a sample of 100 cases, the 
standard error of the mean estimate is k/VlOO, which is one-tenth of .87, or 
.087. The index of predictive efficiency, E, is therefore 100%(1.00 — .087) 
- 91%. 



THE ACCURACY OR EFFICIENCY OF PREDICTIONS 


461 


Thus in the correlation between students’ intelligence test scores and their 
college grades, the standard error of estimate of a predicted mean grade is 
as follows: 


We can therefore be confident that the mean of the universe from which the 
sample results were derived will lie within a range of 2.5(0.7%) = 1.75% 
grade points. This estimate is fairly precise, E being 91% as indicated above. 


Tests of Significance for Predictive Estimates 

The logic of Tests of Significance for predictive estimates is the same as 
for the statistics already considered: 



<Ts 


where s is the sample value, h is the hypothetical parameter value, and cr, is 
the standard error of the measure or statistic under consideration. In a Test 
of Significance for a predictive estimate, s is the predicted value based on the 
sample bi-variate distribution, h is the paramet er value o f a relevant hypothesis, 
and (Ta is the standard error of estimate (cTyVl — Pxy^). 

We shall present a Test of Significance for a predictive estimate with the 
data from Fig. 16:11. Is it likely, when = .50, that students with intelli¬ 
gence test scores of 50 will have grade scores equal to or greater than A'”? 
The hypothesis to be tested is that the parameter value of the predictive 
estimate is equal to at least a grade score of A~, which we shall consider as 
a percentage grade of 90. We have already seen, in Fig. 16:11, that the grade 
score predicted for students with intelligence test scores of 50 is —1.0 stand¬ 
ard deviations below the mean grade. Since the mean grade is 75% and the 
standard deviation of the grade distribution is 8%, the predicte d score in 
terms of percentages is 67%. The standard error of estimate is 8%Vl — (.50)2 
= 6.9%. The Test of Significance for the hypothesis of a grade of A” or better 
is therefore as follows: 

^ 67% - 90% 33% , „ 

^ = 6.9% = ^0 = 

Since the T ratio is 4.8, we can reject the hypothesis with confidence. In other 
words, it is most unlikely that students with intelligence test scores of no 
more than 50 will have grade scores as high as 90%. It should be noted that 
this conclusion does not, and logically cannot, rule out the possibility of an 
individual exception. As we have repeatedly emphasized, the application of 
the theory of probabilities to empirical data is based on what under certain 
circumstances can be expected to occur in the long run, and hence the im¬ 
probable case is not excluded. 



462 


THE PREDICTIVE MEANING OF CORRELATION 


We shall next test the hypothesis that students with intelligence test scores 
of no more than 50 will have grade scores of C or better. With 75% represent¬ 
ing the parameter value of this hypothesis, and with the data from the pre¬ 
ceding example, the Test of Significance is as follows: 

67% - 75% _ 8-0 , g 

6.9% 6.9 

This time the test ratio is 1.2, and hence the hypothesis cannot be rejected. 
Furthermore, since a T ratio of 2.0 or less can be taken as a criterion for 
likely hypotheses, this result is likely. In other words, at least some students 
with intelligence test scores of 50 will in all likelihood have grade scores of 
C or better. However, many other hypotheses are also likely. Such hypotheses 
are denoted by confidence limits set up in terms of criteria equal to ±2.0 
standard error units from the sample value. Since the standard error of esti¬ 
mate is 6.9%, the confidence limits for likely hypotheses will be 67% ±2(6.9%), 
or approximately 53% and 81% (letter grades of F and B”). Since a T ratio 
of 2.5 has been used as the criterion for unlikely hypotheses, such hypotheses 
would lie in the range of possible grade score values below 67% — 2.5(6.9%) 
= 50%, and above 67% + 2.5(6.9%) = 84%. In other words, grade values of 
below 50% (F) or above 84% (B) are unlikely for students with intelligence 
test scores of 50. 


Summary 

From the preceding sections, it should be evident that correlation coeffi¬ 
cients of less than .30 have little value for predictive purposes. Even a coeffi- 
'cient of .50 or .60 does not yield a very accurate estimate of y from x. Correla¬ 
tion coefficients in the .80’s or .90’s are high from the point of view of their 
predictive efficiency. 

However, coefficients of .40 or .50 may be useful in predicting upper score 
limits for variable y from low scores of x, or lower score limits for y from high 
scores of x. When considered in relation to other coefficients by the multiple 
correlation method (cf. Chapter 17, Section E), correlations as low as .20 or 
.30 may even be of value in increasing the predictive efficiency of a battery 
of tests. 

Whether or not a correlation of .50, for example, is low, fair, or high thus 
cannot be answered categorically; it depends upon the nature of the situa¬ 
tion. In the following chapter we shall consider the use of the technique of 
correlation in evaluating psychological tests, a field in which an understand¬ 
ing of the predictive implications of coefficients of correlation is particularly 
essential for an adequate interpretation of results. 



THE ACCURACY OR EFFICIENCY OF PREDICTIONS 


463 


EXERCISES 

1. Distinguish making a prediction and evaluating its accuracy. 

2. Describe how the accuracy or efficiency with which predictions can be made is 
dependent upon the extent of scatter in a bi-variate distribution. 

Use the results obtained for Exercises 10 and 13 of Chapter 9 for the following two 
problems: 

3. Determine the efficiency with which the intelligence test scores of the freshmen’s 
best friends can be predicted from the freshmen’s intelligence test scores of 90. 
Draw a graph to illustrate. 

4. Determine the efficiency with which the average grade index of the freshmen’s 
best friends can be predicted from the freshmen’s average grade of 65. Illustrate 
with a graph. 



CHAPTER 17 


Correlation Methods for the Evaluation 
of Psychological Tests 


The technique of correlation has come to be recognized as an indispensable 
statistical tool for the appraisal and evaluation of psychological test pro¬ 
cedures.* The developments in this field are well illustrated by the contrast 
in the psychological test procedures used in World War I and in World War II. 
Psychological tests for measuring “intelligence’* were used for the first time 
on a large scale during the First World War, when a group of outstanding 
psychologists developed the Army Alpha and Beta tests for the purpose of 
differentiating aptitude for army work. By and large, these tests were based 
upon considerations more rational than empirical. However, tliis implies no 
criticism of the work of these psychologists, for they proceeded in accordance 
with the best stcuidards, information, etc., available at the time. World 
War II saw the use of many different kinds of psychological tests and classi¬ 
fication procedures by all branches of the Armed Forces, the result of the 
wealth of empirical knowledge accumulated during the intervening twenty-five 
years, as well as of the continuing process of validation during the war itself. 
In all such work in test development the technique of correlation is indis¬ 
pensable to any adequate evaluation of results. 

The evaluation of psychological tests is usually approached with two 
basic considerations in mind, namely, iesl reliability and test validity; they 
will be considered in this chapter. A third aspect of the problem has received 
much attention since 1930: the organization of abilities or aptitudes, i.e., how 
they are manifest and interrelated. The treatment of this aspect has been 
essentially statistical, and the techniques involved are usually known as 
factor or cluster analysis; they are considered in Chapter 18. 

Reliability and Validity a Question of Degree 

A test is said to be reliable if it is accurate or consistent, and to be valid if 
it measures what it is supposed to measure. For practical purposes, however, 
these definitions need to be clarified and restated in terms susceptible to 
empirical analysis. Thus, a test is reliable and valid to the extent that its 

* Cf. J. G. Peatman, “On the Meaning of a Test Score,” American Journal of Ortho- 
psychiatry, 9:23^7,1939. For a review of the recent literature, see H. S. Conrad, “ Statistical 
Methods Related to Test Construction and Evaluation,” Review of EduccUional Research, 
14ilia-126,1944. 


464 





BAROMETER AND PSYCHOLOGICAL TEST COMPARED 


465 


results enable the prediction of various kinds of behavior in educational, 
occupational, or social situations. This prediction is a matter of degree, and 
hence the consistency with which a test differentiates performance is likewise 
a matter of degree. The empirical problem, therefore, involves determining 
the degree or extent to which a test is reliable and valid, rather than at¬ 
tempting to answer such poorly framed questions as “Is it reliable?” or “Is 
it valid?” 

A, THE RELIABILITY AND VALIDITY OF A BAROMETER 
AND OF A PSYCHOLOGICAL TEST 

Inasmuch as analogies are oftentimes helpful, we shall illustrate the par¬ 
allel implications of reliability and validity in terms of physical measure¬ 
ment. That a Torricelli barometer provides a relevant analogy is evidenced 
by the frequent allusion in the social sciences to the “barometric” character 
of this or that measure or index. 

The Barometer 

What does a barometer measure? It has been experimentally established 
that it measures atmospheric pressure. The measurement itself is obtained in 
terms of the height—in inches, millimeters, or bars—of a mercury column 
in a tube. The behavior of the mercury column is closely correlated with 
changes in atmospheric pressure. The correlation is perfect except for errors 
of observation and errors implicit in the instrument. If the barometer is well 
designed, the error averages only .005 of an inch when the tube is one-quarter 
of an inch in diameter, and only .002 of an inc^h if the diameter is .5 inch. 
This, then, is the index of the reliability of the barometer, from which it 
follows that the differentiations of atmospheric pressure are highly reliable. 
The reliability of the instrument is determined by making repeated readings 
under experimentally varied and highly controlled conditions. When employed 
under similar circumstances, a barometer is found to yield highly consistent 
measures. The accuracy of the measurement varies to some extent with 
differences in the composition of the atmosphere and the size of the tube. 
But the fact remains that a well-constructed barometer is so reliable an 
instrument that if its reliability were expressed as a correlation coefficient, 
r would approach 1.00. 

How about its “validity”? (1) We have already indicated that a barometer 
measures atmospheric pressure. This can be designated as its operational 
validity^ i.e., validity in terms of operations which, in the case of this instru¬ 
ment, correlate perfectly with atmospheric, pressure. (2) It can also be used to 
predict a type of physical behavior which is distinct from atmospheric pres¬ 
sure as such, that is, changes in weather. This is one of the most important 
practical functions of a barometer, and can be designated as one aspect of 
its functional validity. It can also be used to gauge the altitude above or 



466 CORRELATION METHODS FOR EVALUATING PSYCHOLOGICAL TESTS 


below sea level. At sea level, the barometric measure is about 30 inches; at 
1000 feet below, the measure is about 31 inches; at 1000 feet above, it is about 
29 inches; and at 50,000 feet above, it is about 3| inches. The practical im¬ 
plications of such an instrument in air navigation are obvious. Its functional 
validity in relation to weather and altitude is measured by the degree of 
correspondence between the readings and the other physical factors. 

The Psychological Test 

Now let us examine the analogy between a barometer and a psychological 
test. For this we shall use the Minnesota Vocational Test for Clerical Workers,* 
and we shall take up the analogous points in turn: (1) the immediate character 
of the measures (test scores) obtained; (2) the reliability of the instrument 
(test); and (3) the validity of the measures (test scores), with respect to both 
operational validity and functional implications. 

Measures Obtained (Test Scores) 

The Minnesota Clerical Test consists of two parts, number-checking and 
name-checking. Numbers and names are each listed in pairs, the subject’s 
task being to discriminate dissimilar and identical pairs. Two measures are 
obtained: (1) the total number of correctly discriminated number-pairs, and 
(2) the total number of correctly discriminated name-pairs. Since the task is 
fairly simple for most adults, there is a time limit, and speed thus becomes an 
integral part of the meaning of the score. 

In some psychological tests, the measures may be obtained in inches (as 
in steadiness tests) or seconds or minutes (as in any amount-limit test, reaction¬ 
time tasks, etc.). In the Minnesota Clerical Test, however, the measure is a 
count —^an enumeration of the total number of tasks correctly performed. 
This type of measure is characteristic of the kind obtained with many other 
psychological tests. 

The Reliability of the Measures Obtained 

The usual index of reliability of a test is a correlation coefficient which 
measures the consistency with which the abilities sampled are differentiated. 
It is only indirectly analogous to the index of reliability of a barometer 
(which, as we have seen, is taken in terms of the expected error and may 
average only .005 or .002 of an inch). Furthermore, the reliability coefficient 
of a test is obtained by one of several methods to be described in Section B, 
and is in itself a somewhat unsatisfactory measure because the correlation 
obtained is affected by the range of ability of the sample used in standardizing 
the test. Thus, an estimate of test reliability based on a sample of college 


* D. M. Andrew, D. G. Paterson, and H. P. LongstafF, Minnesota Vocational Test for 
Clerical Workers, P&ychological Corporation, New York, 1933. 



BAROMETER AND PSYCHOLOGICAL TEST COMPARED 


467 


freshmen will most likely be lower than one based on all IS-year-olds. Fortu¬ 
nately, however, the standard error of a test score, <Tx (cf. Chapter 13) takes 
into account the variability of a sample group and at the same time yields 
an index of reliability more immediately meaningful and useful than the 
usual reliability coefficients given for tests. 

The Standard Error of a Test Score {(Tx)* A test is used to obtain measures 
that can be identified with a scale, or series of measures, which wiU signify 
for each test score a relatively consistent placement, or position, on the scale. 
The scale yielded by a test is assumed to represent a continuum that ranges 
from the least to the greatest degree of the abilities being measured. (This 
assumption of continuity may be difficult to justify in the case of personality 
and interest inventories.) The behavior of a mercury column in a barometer 
is similarly assumed to be scaled with respect to such a continuum. The higher 
the reading, the less the atmospheric pressure; the lower the reading, the 
greater the pressure. The problem of measuring the reliability of an instru¬ 
ment, therefore, depends on determining the accuracy of the location of each 
measure on the scale: Is its position subject to a large range of error, or to 
such a small range as to be negligible for all practical purposes? For any 
psychological test, the standard error of a test score, ax, gives an estimate of 
the accuracy of the result, in terms of its location on the scale. Hence, it is 
the most practical and meaningful index of the reliability of a test. 

If a person receives a score of 150 on number-checking in the Minnesota 
Test for Clerical Workers, is this significantly greater than the median 
score of 144 obtained from a sample of employed clerical workers? Can a 
difference of six units on the scale be expected on the basis of chance, or is 
the score of 150 significantly greater than the median score value of 144? 
The following Test of Significance answers these questions: 



ax 


where is the individual score, Xh is the value of the hypothesis tested (in 
this case, the median of 144), and ax is an estimate of the standard error of 
Xh, which we saw earlier is equal to the following: 

ax “ CTafA/l Pxz ' 

where or* is the standard deviation of the distribution of measures, and r** is a 
measure of the reliability of the^test.* Reliability coefficients as high as .91 
have been reported for this test; consequently, this value will be used to sim¬ 
plify the example. If the standard deviation of the distribution is taken as 
25 units, the standard error of a score will be: 

o-x = 25Vl - .91 = 25(.30) = 7.5 


* See Table V, Appendix B, for values of Vl — r. 



468 CORREUTION METHODS FOR EVALUATING PSYCHOLOGICAL TESTS 


This value of 7*5 is analogous in its implications about reliability to the 
index of reliability of the barometer. When the reliability coefficient of a 
test is .90» the standard error of an individual score is about one-third the 
standard deviation of the distribution. 

We now have the necessary values for the above Test of Significance: 

^ _ 150 - 144 6.0 ^ ^ 

<rx 7.5 7.5 

This T ratio, 0.8, is too small to warrant the conclusion that a score of 150 is 
significantly greater than the median of 144. 

What result would be significantly greater than a median performance of 
144? If a T ratio of 2.5 is taken as the criterion for a significant difference, 
then 2.5 times the standard error of a score will give the desired estimate, 
2.5(7.5), which equals 18.75, a difference of 19 units in the scale. If the score 
is 169 rather than 150, we can conclude that it is significantly greater than 
the median. 

A test is thus more reliable, the narrower, so to speak, the range of possible 
error of a particular score on the scale. This was also the case for the barom¬ 
eter, whose error is no greater than a few thousandths of a unit (inch) on the 
scale. If the location of a test score on a psychological scale is subject to a 
wide margin of error, the test will not be very useful. 

The Validity of Test Scores 

What about the validity of the Minnesota Clerical Test? As in the case of 
the barometer, we must distinguish between its operational and its functional 
validity. 

Operational Validity, The Minnesota Test measures two closely related 
types of psychological functions, viz., the speed of number-checking and 
name-checking. We can assume that scores on the test are directly correlated 
with differences in ability to perform these tasks. Although this correlation 
should theoretically be perfect, it is not, because of errors of response and 
those arising in administering and scoring the test. Furthermore, the degree 
to which these abilities are imperfectly measured is indicated by the relia¬ 
bility of the test scores. 

It is difficult, if not impossible, satisfactorily to summarize the operational 
character of some psychological instruments, as for example, a personality 
or an interest inventory. On the other hand, a test may have useful functional 
implications in diagnosing or in predicting behavior in life situations, and at 
the same time have no agreed-upon operational validity. Fortunately, from 
the point of view of counseling and measurement, this is not a handicap. 
When a test of “general mental ability” is shown empirically to be valuable 



BAROMETER AND PSYCHOLOGICAL TEST COMPARED 


469 


in predicting scholastic aptitude, or aptitude for a particular occupational 
situation, it makes little difference what it is called, unless its name has been 
loaded historically with emotion or misleading implications, such as the 
term “intelligence.” 

To summ^ize, simply naming a test does not establish its operational 
validity; rather, what it measures is dependent upon a sound analysis of the 
tasks and responses directly involved. Tlie question still remains, however, 
whether it will satisfactorily predict behavior in this or that occupational 
situation. This is the problem of its functional validity. 

Functional Validity. We saw that a barometer is used in the predic¬ 
tion of changes in the weather as well as in the measurement of distance 
above or below sea level. The latter involves practically no error because of 
the extremely high correlation with atmospheric pressure. However, weather 
predictions cannot be so accurate because factors other than atmospheric 
pressure alone are involved. The situation in psychological measurement is 
even more complex because behavior in a given situation is influenced by a 
multiplicity of factors, many of which are unknown or immeasurable. Thus 
clerical success depends upon many more factors than number-checking or 
name-checking. Consequently, the empirical problem is to determine whether 
a critical score for the test can be established that will generally differentiate 
the members of a given population who are likely to succeed from those who 
are likely to fail. 

Critical Scores 

In practice, it is desirable to set two critical scores: one score that will per¬ 
mit the selection of individuals most likely to succeed; and another, lower 
score that will delimit those most likely to fail. The values between these two 
scores constitute a range whose implications are doubtful. Such a procedure 
takes into account the unreliability of the test scores. The authors of the 
Minnesota Clerical Test make no attempt to determine such scores; instead, 
they say that “the critical scores for the selection of employees for a given 
occupation should be determined by the hiring standards for the particular 
company.” This is sound; in fact, it is the only practical procedure. Business 
and industrial organizations should develop their own critical scores on the 
basis of practical experience. 

Still another critical score that is sometimes useful is an upper critical 
score such that values beyond it are more indicative of failure than of success. 
Many organizations have found that people who have very high scores on a 
mental aptitude or clerical test do not do well in routine jobs, not because 
they lack ability, but because the jobs are not sufficiently interesting to con¬ 
tinue to motivate them. Aptitude is compounded of both potential abilities for 
and interest in the tasks to be performed. 



470 CORRELATION METHODS FOR EVALUATING PSYCHOLOGICAL TESTS 


B. THE DETERMINATION OF TEST RELIABILITY 

Despite the fact that we might “measure” distance with an elastic yard¬ 
stick, it is obvious that we could not be assured of the consistency of our 
measurements. Yardsticks must be constructed in such a way that the 
measurements obtained with them will be consistent. Similarly, a watch or 
clock will yield a “measure” of time as long as it is running. However, unless 
its measurement of seconds, minutes, and hours is such that it keeps correct 
time, it is unreliable and consequently unsatisfactory, despite the fact that 
time itself is “measured.” 

The measurement of distance, time, weight, etc., is based upon the centi- 
meter-gram-second (c.g.s.) system. For each of these types of measurement, 
the fundamental problem is not validity but reliability, because the measure¬ 
ment of distance, time, or weight is inherent in the operation performed. 
That is, by definition, distance, time, and weight are each measured by a 
series of appropriate and well-standardized operations. In all such cases, 
there is no doubt that the measuring instrument is yielding an observation 
of the required kind. In other words, there is no question about the operational 
validity. The real question in each case is the degree of reliability of each 
type of measurement. 

We have seen that the operational validity of the Minnesota Clerical Test is 
analogous to that of measures based on the barometer. Another example is 
the operations comprising the reaction-time experiment, which measures the 
speed with which an individual can react to a stimulus. The result is given in 
terms of time, and there is no question but that reaction time is measured. 
The basic question concerns the reliability of the result. 

The operational validity of an intelligence test or of a personality inventory 
is much more difficult to define. It is not sufficient to say that “intelligence” 
is that which is measured by an intelligence test. Such a definition is useless 
unless the operations actually involved in the test are described and under¬ 
stood in detail. Even then, the definition is often unsatisfactory because the 
avowed purpose of an intelligence test is to differentiate what people can be 
expected to do in various life situations. The necessary empirical approach to 
determining the validity of an intelligence test is the functional approach of 
ascertaining the kinds of behavior which are predictable, at least to some 
degree, from such tests. 

The Correlation Index of Test Reliability 

Since the product-moment correlation coefficient is an index of the degree 
of co-variation between bi-variates, the reliability of a test can be expressed 
in terms of it. The greater the consistent differentiation of the quality or trait 
measured by the test, the more reliable the test. If, for example, a test is 
administered and the results obtained are correlated with those obtained 
from the same people on a readministration of the same test, the technique 



THE DETERMINATION OF TEST RELIABILITY 


471 


of correlation can be applied to ascertain the degree to which the test differen¬ 
tiates the individuals in relatively the same way on both administrations. If 
the correlation is low, the test is no more reliable than an elastic yardstick. 
If it is high (.90 or more), it should give a fairly satisfactory differentiation. 

Unfortunately, there is no one best procedure for obtaining an index of the 
reliability of a test. The most commonly used methods are: 

1. The test-retest, by which a group’s performance on a test is corre¬ 
lated with its performance on a readministration of the same test. 

2. The correlation of a group’s performance on alternate forms of the 
same test. 

3. The split-half technique, or correlation of individuals’ scores on odd 
and even items of a test. 

A fourth method, item inter-correlation, is also sometimes employed. 

All these techniques attempt to measure the reliability of a test in terms of 
the consistency with which it differentiates the attribute or trait measured, 
irrespective of the attribute or trait concerned. When alternate forms of a 
test are available, the second method is one of the best. The split-half tech¬ 
nique is not entirely satisfactory because the index of reliability derived from 
it is not necessarily meaningful over a period of time; i.e., it measures the 
consistency of individual differentiation at the time the test is administered. 
The first and second methods are often superior in this respect. This is an 
important consideration in tests of ability or aptitude. In attitude question¬ 
naires and personality and interest inventories, however, it may not be so 
important because the psychological qualities involved may be expected to 
change more than abilities and aptitudes, at least during the period of growth. 
At any rate, the split-half technique has come to be used extensively for esti¬ 
mating the reliability of an inventory or questionnaire. 

Test Reliability by the Method of Test-Retest (r^J 

In estimating the reliability of a psychological test by the test-retest method, 
the “accuracy” of the differentiation is checked by means of the readminis¬ 
tration of the same test to the same group. The people who take the test 
should be chosen randomly from the universe for which the test is designed. 
An index of reliability is obtained by correlating the two sets of results. This 
correlation coefficient provides a measure of the consistency with which the 
differentiation of the test results is maintained over the time between the 
two tests. The coefficient itself is symbolized by r**, where r is the product- 
moment correlation coefficient, the subscripts x and x standing for the test 
variable correlated with itself. 

The consistency of the differentiation of individual results over a period of 
time is important relatively rather than absolutely. A person might well 
achieve a higher score on the second administration of a test, but the real 



472 CORRELATION METHODS FOR EVALUATING PSYCHOLOGICAL TESTS 


question from the point of view of test reliability is whether his performance 
remains relatively the same—whether his relative position in the test scale 
is unchanged. The product-moment correlation coefficient measures the degree 
of co-variation between two variables irrespective of the absolute size of the 
scores or measurements, because the scales that measure both variables are 
made comparable in terms of units of standard deviation. Consequently, 
product-moment correlation is well adapted to measuring the extent to which 
the relative position on a scale remains unchanged. 

The test-retest method is likely to yield too high rather than too low an 
estimate of reliability because of the possible correlation of “memory factors” 
or of “errors.” That is, unless the time interval between the tests is sufficiently 
long, the responses to many items in the test (whether correct or not) may 
be remembered. This makes for positive correlation. For this reason, it is 
best to use this method only when there can be a sufficient time interval 
between the two administrations of the test. 

Test-Retest Reliability of a Digit-Span Test 

Test-retest reliability will be illustrated by the digit-span test which has 
been long used in Binet intelligence testing.* A coefficient of .86 was obtained 
for two administrations, at an interval of about two months, of the same 
^git-span test to 142 college students. None of the subjects knew in advance 
that the same material would be used for the second test. The reliability was 
satisfactory, despite the fact that the subjects consisted of college students, 
and hence represented a rather restricted range of general ability. With 
samples less restricted in this respect, the coefficient should be considerably 
higher—well in the .90’s. Nevertheless, the test-retest method was appropri¬ 
ate for determining the reliability of this type of test material because it was 
unlikely that the subjects would remember any particular item from one test 
to the other. However, the result may have been influenced by some corre¬ 
lation between such factors as mnemonic techniques developed during the 
first testing and carried over into the second. But the correlation would have 
been lowered rather than increased, if other individuals employed such 
techniques during the second test but not during the first. 

This latter point suggests one of the shortcomings of the methods used to 
estimate the reliability of any test. Actually, of course, a test has no meaning 
as a measuring instrument unless it is considered in relation to a person. 
Consequently, its reliability cannot be appraised or evaluated independently 
of people’s responses to it. A test might at a particular time yield a highly 
consistent differentiation of the abilities or psychological factors tested, 
whereas, when given again to the same subjects, the reliability coefficient 
might be lowered as the result of factors which are desirable and in them- 


* J. G. Peatman and N. M. Locke, “ Studies in the Methodology of the Digit-Span Test,” 
Archives of Psychology^ No. 167, 1934, 



THE DETERMINATION OF TEST REUABIUTY 


473 


selves of psychological significance. This is to some extent true of any test 
of ability or capacity, but it is particularly true of the various personality 
and interest inventories which have been developed for psychological diag> 
nosis, because the level of ability or capacity generally varies less than per¬ 
sonality, interest, attitude, etc., at least during the period of growth. A high 
test-retest coefiicient for a personality inventory such as the Bell * or an 
interest inventory like the Kuder t would indicate relatively no change in 
the attributes inventoried. If the time interval between the two administra¬ 
tions of the same inventory is long, a high coefiicient may be significant inde¬ 
pendently of the reliability of the instrument itself. On the other hand, a 
low coefficient does not in itself necessarily reflect on the reliability of the 
instrument. Changes in the personalities, interests and attitudes of growing 
boys and girls are to be expected. The test-retest method is consequently the 
least satisfactory technique for estimating the reliability of personality and 
interest inventories. 

Test Reliability by the Method of Alternate Forms I 

A second commonly used method for estimating the reliability of a test 
consists in (1) administering one form of the test to a sample, (2) adminis¬ 
tering an alternate but equivalent form to the same sample, and (3) corre¬ 
lating the results on the two forms to measure the reliability. The reliability 
coefficient obtained by this method is symbolized by r*a/, the subscript x 
representing the variable measured by the first test, and the subscript x' 
representing the alternative form of the test. The latter subscript differen¬ 
tiates this coefficient from the test-retest coefficient (r^x). 

Reliability coefficients obtained from alternate forms of a test are likely 
to be too low rather than too high because of the impossibility of devising 
two forms of the same test that are really equivalent. In fact, equivalence is 
itself a complicated concept, so far as psychological tests are concerned. Two 
tests are equivalent if they differentiate a population in exactly the same 
way. However, if this is taken to mean that the individual differentia¬ 
tion must be consistent, the reasoning is circular, because it implies that the 
correlation between the two forms must be perfect, or at least very high. But 
this correlation is what is being sought as a measure of the reliability of the 
test. In practice, therefore, two forms of a test are considered fairly equiva¬ 
lent if they yield similar means, variations, and distributions for appropriate 
samples of the universe for which the test is designed. Two items of a test 
are usually eonsidered equivalent if, for a given sample, the task involved is 

* H. M. Bell, The Adjustment Inventory: Adult Form, Stanford University Press, Stanford 
University, 1938. 

t G. F. Kuder, Preference Record, Science Research Associates, Chicago, 1942. 

t Such intelligence tests as the Army Alpha and the revised Stanford-Binet are available 
with alternate forms. Often, however, alternate forms of a test are not available. 



474 CORREUHON METHODS FOR EVALUATING PSYCHOLOGICAL TESTS 

similar, and a simileur test result in terms of errors and correct performances 
is obtained for each. 

Test Reliability by the Split-Half Method 

22 

Perhaps the most widely used method for estimating the reliability of tests 
is the split-half technique. It has the advantage over the other two in that an 
estimate can be obtained from only one administration of a test. This obviates 
the need for an alternate form, and also the possibility of the result being 
distorted by memory factors, as in the test-retest technique. Unfortunately, 
however, the split-half method does not always yield the result actually 
sought. That is, an investigator who wishes to measure the consistency with 
which a test differentiates the relative abilities or capacities of a universe 
over a period of time may find tliis method inadequate because it yields a 
coefficient of reliability for the test only at a particular time. Furthermore, 
the method is subject to manipulation in that the longer the test or the more 
parts it has, the higher the coefficient obtained by this method. That this is 
the case will be clearer after the technique has been described. 

The basic principle underlying the split-half method is the division of each 
person’s results into halves, and the correlation of the group’s results for 
each half of the test. The basis on which the division is made is, of course, 
important. The division is usually made by obtaining each individual’s results 
on the odd and on the even items of the test; hence the name, the odds-even 
method of reliability sometimes given this technique. The reliability coeffi- 

cient obtained by this method is symbolized by rx^f the subscript ^ represcnt- 

22 ^ 

ing one-half of the test results, and the subscript — the other half. 

This method is similar to the alternate test method, in that each half is 
analogous to an alternate form of a test. However, in the split-half method, 
the two halves are administered not consecutively but simultaneously; i.e., the 
subject does an odd-numbered item, then an even-numbered, etc. 

This method may be exemplified by a vocabulary test administered to a 
group of 181 college students. The test consisted of a list of 80 words to be 
defined in terms of multiple-choice alternatives. Two scores were obtained 
for each subject: the total number of correct responses (1) for the 40 odd- 
numbered items and (2) for the 40 even-numbered items. These results were 
then correlated for the group, a product-moment r of .77 being obtained. 

Spearman-Brown Prophecy Formula * 

The correlation coefficient obtained by the split-half technique provides an 
index of reliability which is too low for a test as a whole, since the coefficient 

*C. Spearman, “Correlation Calculated from Faulty Data,” British Journal of Psy¬ 
chology, 3:281,1910; W. Brown, “Some Experimental Results in the Correlation of Mental 
AbiUties,” 3:299,1910. 



THE DETERMINATION OF TEST RELIABILITY 475 

is based upon a bi-variate composed of the two halves of the test rather than 
on two whole tests, as in the test-retest method. The reliability of the test 
as a whole can be estimated by means of the Spearman-Brown prophecy 
formula, which is based on the assumption that increasing the length of a 
test is theoretically possible without changing the difficulty, character, or 
any other relevant conditions attendant upon the administration of the test. 
The generalized formula is as follows: 

W D7:l] 

" 1 + (L - Spearman-Brown 

' prophecy formula 


where L symbolizes the ratio between the desired length and the actual 
length of the test employed, and symbolizes the reliability coefficient de¬ 
rived from the administration of alternate forms of the test. 


This formula can be used to estimate the reliability of a test whose length 
is increased as many times as is desired. Thus, with the split-half correlation 
coefficient, L is equal to 2, because the whole test is, of course, twice as long 
as either of its halves. Hence, for estimating split-half reliability, the general¬ 
ized formula simplifies to the following: 

[17:2] 

Spearman-Brown 
prophecy formula for 
estimating the reliabil¬ 
ity of a test as a whole 


2r. 




X X 
2 2 


1 + f*xx' 

IT 


Applying this formula to the split-half coefficient of .77 obtained for the 
vocabulary test gives the following estimate of the reliability of the vocabulary 
test as a whole: 

_ 2(.77) ^ 

I 77 

It is this estimated reliability for the test as a whole that is usually reported 
in the literature as the split-half reliability coefficient. 


Split-Half Reliability by Method of Differences for r**/ 

Earlier (pages 248-249) we presented another method of correlation which 
is often convenient to use to obtain a split-half reliability coefficient. The 
method of differences for r has an advantage over the above procedure in 
that the split-half coefficient can be computed directly from the differences 
between the paired, original odd and even scores. The r coefficient which is 
obtained is for the two halves of the test, and r for the test as a whole can 
be estimated by Formula 17:2. It will be recalled that the method of differ¬ 
ences consists in obtaining the sum of the squared differences between each 
individual’s original scores on each half of the test. The greater these differ¬ 
ences, the less reliable the test. The ratio of the squared differences to the 
variance of the test as a whole is subtracted from 1.0: 



476 CORRELATION METHODS FOR EVALUATING PSYCHOLOGICAL TESTS 


[17:3] 

— 1 «. Split-half reliability by 

the method of differ¬ 
ences 

where D is the difference between each person’s scores on the two halves of the 
test, squared and summed for the entire group; is the size of the sample; 
and is the variance of the distribution of scores for the test as a whole. 

The method of sums for r is also convenient to use to obtain r between 
alternate forms of a test, as was done in Table 9:7, page 249. 


Test Reliability by the Method of Item-Intercorrelation 


The reliability of a test can also be estimated by the method of item-correla¬ 
tion. This technique is more analogous to the split-half procedure than to 
either of the other two because it measures the reliability of a test at the 
time of its administration. However, this method is cumbersome and is 
usually not worth the excessive amount of statistical computations required 
unless the intercorrelations between all the items of a test are needed for some 
other purpose, as in item analysis. 

The intercorrelation of responses to items can be facilitated by means of 
Thurstone’s Diagrams for Tetrachoric Coefficients.* The correct and incorrect 
responses for the items taken two at a time must be cross-tabulated. Once all 
the intercorrelations between items are obtained, the next step is to compute 
the average intercorrelation. If all the coefficients are of about the same order, 
they can be averaged directly with little error; but if they vary considerably 
(say from .10 to .90), they should be converted to Fisher’s z function before 
they are averaged (see Table VI, Appendix B) or the median intercorrelation 
coefficient can be employed instead of the mean of the coefficients. 

Once the average intercorrelation coefficient is obtained, the reliability of 
the test as a whole can be estimated by the Spearman-Brown prophecy 
formula. If a test has 100 items and the average of the intercorrelations be¬ 
tween items is .30, the reliability coefficient of the test as a whole is determined 
by estimating r for 100 items—^in other words, for a test 100 times as long as a 
single item. The Spearman-Brown formula gives the following reliability 
coefficient: 


100(.30) 

'•“'(looL) 1 ^ 99(.30) 


.98 


EfFect of Range of Ability on Test Reliability 

The correlation coefficient for test reliability is definitely affected by the 
range of ability among the individuals tested. The correlation coefficient for 
two administrations of the same test obtained from subjects fairly homo- 

* L. Chesire, M. Saffir, and L. L. Thurstone, Computing Diagrams for the Tetrachoric 
Correlation Coefficient, Univ. of Chicago Bookstore, Chicago, 1933. 



THE DETERMINATION OF TEST REUABIUTY 


477 


geneous in their abilities will not be as high as it would if the subjects were 
more heterogeneous in this respect. In other words, a psychological test that 
differentiates broad ranges of ability fairly well may have little or no value 
in differentiating narrow ranges. The finer the differentiating power, the 
more reliable the test. But a low reliability coefficient for a result obtained 
from a rather homogeneous sample does not mean that the lest is entirely 
useless. The basic question is whether it differentiates abilities sufficiently 
consistently for the situations in which it is to be used. 

The following formula makes it possible to estimate the reliability of a 
test if the variability of the sample were increased; only the reliability coeffi¬ 
cient for the restricted variable and its standard deviation are required: 

[17:4] 
The efiPect, on the re- 
liahility coefficient, of 
increasing the variabil- 
ity of the universe 

in which is the reliability coefficient of the variable x with its variability 
increased; is the variance of the increased, hypothetical variable; Gx^ is 
the variance obtained from the sample with the restricted range of ability; 
and Txx is the reliability coefficient obtained for the sample result. 

The use of this formula will be illustrated by the digit-span test data of 
Table 9:2. The standard deviation of the test was 1.3 and the test-retest cor¬ 
relation coefficient was .84. What would be the coefficient of reliability for 
this test if the variability were twice as great? The estimated coefficient is 
determined as follows: 

^ (2.6)^ - (1.3)^(1 - .84) ^ 

- ( 2 . 6)2 “ 

Thus, if the test results had been derived from a random sample of a broader 
population rather than from college students with their restricted range of 
ability, the reliability of the test would have been well in the .90’s. 

All the above considerations are important in appraising the reliability of any 
test. If the reliability coefficient for a test has been determined from a sample 
with a rather restricted range of ability, and if the coefficient is relatively 
low, it does not necessarily follow that the test is worthless as a means of 
differentiating consistently the functions or qualities being measured. Thus 
college students, with their restricted ranges of ability, have often been used 
in the evaluation of tests, and consequently, the general value of some tests 
has not been clearly recognized. 

On the other hand, a test which has an estimated reliability coefficient of 
.96 computed on the basis of a sample whose variability is theoretically 
increased is not thereby better than it was originally, so far as its use with 
the original restricted sample is concerned. In other words, the estimate of 
increased reliability provides a basis for judging whether the test will be 




478 CORRELATION METHODS FOR EVALUATING PSYCHOLOGICAL TESTS 


satisfactory for differentiating the abilities of a broader range of talent in the 
population. 


C. THE DETERMINATION OF TEST VALIDITY 

The problem of validity has two aspects: operational validity and func¬ 
tional validity, mentioned earlier. Both of them have been used in psycho¬ 
logical measurement, but only in recent years has functional validity re¬ 
ceived the attention it deserves. 

Operational Validity 

Whether a test developed to measure clerical ability, for example, is suffi¬ 
ciently valid depends to a great extent upon the way in which the question 
concerning its validity is framed. If the test includes a scries of tasks, such 
as number-checking and name-checking, and is satisfactorily reliable (the 
reliability coefficient being .90 or more), a rational appraisal of the opera¬ 
tions involved in the test itself might lead to the conclusion that the test is 
valid (operationally) for clerical ability. Whether it will differentiate clerical 
ability in actual working situations is, however, another question. 

In the operational approach to validity, the specific nature of the tasks or 
functions comprising the test is described, and the test is defined in these 
terms. Unfortunately, however, the logic of this approach is not always 
followed. For example, on the basis of the logic of the operations involved, 
the Minnesota Clerical Test should be called a name- and number-checking test, 
rather than a test of clerical ability. If such a test proves to be a reliable 
yardstick for differentiating ability to perform these kinds of tasks, then, 
by definition, it is operationally valid. But its validity is established only for 
these two operations, and this does not necessarily imply validity for differ¬ 
entiating clerical ability as a whole. 

Functional Validity 

An empirical appraisal of the functional validity of a test consists in deter¬ 
mining whether, in fact, it does differentiate a given ability in actual working 
situations and, if so, the degree to which it does. This appraisal requires an 
adequate sample of subjects, and in the case of clerical ability, a measure 
of each subject’s clerical proficiency in the actual working situation, so that 
these criteria can be readily correlated with his results on the clerical 
ability test. A high correlation means that individuals who manifest a great 
deal of clerical ability in their work do well on the test, and that those who 
manifest less clerical ability in their work do not do as well on the test. The 
usefulness of a test whose validity coefficient is .80 or .90 should be apparent. 
Unfortunately, however, no single test of clerical ability has been devised 
which yields a functional validity coefficient as high as this, because success 
in clerical occupations depends upon much more than the ability to perform 



THE DETERMINATION OF TEST VALIDITY 479 

two or three relatively simple tasks; it depends upon many kinds of abilities, 
as well as on the individual’s personality make-up. 

Test Validities 

An important implication of the functional aspect of validity is the fact 
that a test may be valid for more than one type of situation. That a test 
may have different validities for different situations rather than simply “a 
validity” is well illustrated by the so-called general intelligence test. This 
type of test was originally constructed to differentiate the “intelligence” of 
individuals. From the operational point of view it was assumed to do this 
because it included a variety of functions or tasks which require “intelligence” 
for their successful performance. One leading psychologist seriously proposed, 
in the early 20’s, that intelligence be defined as that which intelligence tests 
measure. From a practical point of view, this reasoning is circular, and 
hardly resolves the problem. As a result of greater emphasis on the problem 
of functional validity since that time, it is recognized today that an intelli¬ 
gence test may be more valid for some purposes than for others, and that 
in any event its validity has a pluralistic rather than only a single aspect. 
An intelligence test like the Wechsler-Belleviie has been found to be reliable 
and useful in predicting aptitude for various types of work as well as scholastic 
aptitude. The validity of this test lies not in its definition as an intelligence 
test, but rather in tlie fact that aptitude for many different types of work or 
activity can to some extent be predicted with it. 

Validity Criteria—^Abilities vs. Aptitudes 

The chief problem in appraising the fund ional validity of a test is to obtain 
satisfa(!tory criteria for checking or measuring its validity for a particular 
situation. This problem is much more complex for the measurement of apti¬ 
tudes than of abilities because aptitudes, by definition, are abilities not yet 
fully developed. An aptitude is potential ability and interest, rather than 
proficiency after training and experience. If, for example, a test is to be 
designed which will satisfactorily differentiate the mechanical aptitude of 
high-school students, this requires a test which will measure capacities for the 
development of mechanical abilities. 

Aptitude measurement thus involves the determination of whether any 
abilities are present which, when measured, will serve as a basis for the pre¬ 
diction of later achievements. In a situation of this kind, there are obviously 
no observations or measurements of abilities in the actual working situation 
which can be used as criteria for validating the test. Hence, an appropriate 
sample of subjects is set up and tested, and an index of their particular 
ability is obtained later, after they have had an opportunity to develop it. 
With such measures, their actual achievement can be compared with their 
earlier performance on the aptitude test. If the correlation is high enough 



480 CORRELATION METHODS FOR EVALUATING PSYCHOLOGICAL TESTS 


for predictive purposes, then and to that degree only is the test valid for 
differentiating the particular aptitude. 

Effect of Range of Ability on Test Validity 

We pointed out earlier that a correlation coefficient is definitely affected 
by the variability or range of ability characteristic! of the sample, and we 
presented a formula that permits an estimate of the reliability of a test if 
used with a broader range of ability. Test validity can be estimated similarly. 
Such an estimate is important whenever there is evidence that a test has been 
validated with a sample whose range of ability is more restricted than would 
be characteristic of the general use of the test. For example, many tests have 
been developed and appraised with samples of college students whose range 
of ability is necessarily restricted, at least for some kinds of abilities. 

If a test is to be used with people whose range of ability is twice as great 
as that of the group on which the test was standardized, the validity of the 
test should be considerably greater for individuals with the wider range. This 
can be estimated by the following formula: 

,_ D7*5] 

_ — <r*/(l — Vex^) The effect, on a valid- 

"■ \ - 5 - ity coefficient, of in- 

L creasing the variability 

of the universe 

in which rcxi^ is the coefficient of validity, i.e., the correlation between the 
criterion c and the variable of the test x when the variability of the sample 
is increased; <Tx^ is the variance for the universe of increased variability; 
(Tx^ is the variance of the test for the universe of the restricted sample; and 
Tex, is the validity coefficient obtained from the sample. This formula is 
based upon the assumption that the standard errors of estimate of both 
universes are equal: _ _ 

If the validity coefficient for the sample is .40, the variance of the rest 
results for the restricted sample is 5.0, and the variance of the universe is 
twice as great, i.e., 10.0, the validity coefficient will be: 

« - /lO.O - 5.0(1 - .402) /— 

^ - - =v'.58 = .76 

Test Battery Validity 

In many of the most useful procedures for psychological measurement a 
battery of tests, rather than a single test, is used. In such cases the validity 
of several tests must be appraised by considering the effectiveness of the 
battery of tests cls a whole rather than singly. The correlation technique is 
used for this, but in appraising the validity of the tests as a whole, simple 



TEST ITEM ANALYSIS 


481 


correlation is supplemented by multiple correlation. This latter, as well as 
partial correlation, will be considered briefly in Sections E and F respectively. 

D. TEST ITEM ANALYSIS 

Basic research problems in psychological measurement include the evalua¬ 
tion not only of tests but also of the items or tasks of a test. The methods to 
evaluate test items are basically very similar to tliose described for appraising 
tests. 

Item Reliability and Validity 

The logic underlying the evaluation of test items is straightforward, even 
though in practice it is often overlooked. Basically, a test item is reliable if 
its results correlate highly with the total score of the entire test, provided of 
course the test itself is highly reliable. Furthermore, a test item is operation¬ 
ally valid if it correlates well with the total test which has itself been demon¬ 
strated to have satisfactory operational validity. Finally, a test item is func¬ 
tionally valid if it correlates highly with an independent criterion. Even 
though the test as a whole may not be satisfactory as far as its functional 
validity is concerned, each item in it can be correlated with an independent 
functional criterion of validity, and the poor items thus be eliminated. 

The validation of psychological test items is too often approached from 
the point of view of only an internal analysis of operational validity; that is, 
the total test score is too frequently taken as the only criterion of validity. 
The particular items that correlate well with the total score are considered 
relatively valid, and those that correlate low or negatively are considered 
unsatisfactory or invalid. The soundness of this procedure for the practical 
problems of measurement obviously depends upon the functional validity 
of the test as a whole. 

Item analysis offers a means of developing a standardized test, for it 
enables the selection of items of the proper levels of difficulty, and those 
which yield the best predictions of the criterion. 

Biserial and Fourfold Correlation Techniques 

Although the statistical techniques employed for item validations are varied, 
they are generally based on biserial correlation or on correlation techniques 
for fourfold tables,* because the answers to test items are usually dichotomized 
as “True” or “False,” “Correct” or “ Incorrect,” “Yes” or “No.” When op¬ 
erational (internal) validity is to be determined, biserial r is often employed 
because the criterion is the distribution of scores for the test as a whole. 

* Cf. J. C. Flanagan, “ General Considerations in the Selection of Test Items and a 
Short Method of Estimating the Product-Moment Coefficient from Data at the Tails of the 
Distribution,” Journal of Educational Psychologyy 30:674-^80, 1939; J. P. Guilford, “The 
Phi Coefficient and Chi Square as Indices of Item Validity,” Psyckomeirikay 6:11-19,1941, 
and “A Simple Scoring Weight for Test Items and Its Reliability,” ibid., pp. 367-374. 



482 CORRELATION METHODS FOR EVALUATING PSYCHOLOGICAL TESTS 


When functional validity is to be determined, tetrachoric r, the <l> coefficient, 
or chi-square is usually employed because the independent criterion itself 
is often dichotomized into “success” or “failure,” “satisfactory” or “unsatis¬ 
factory,” “x present” or “x absent,” etc. 

These methods of correlation have been described in earlier chapters and 
consequently will not be described again. Generally, however, test items are 
valueless unless their correlation with an adequate criterion is significantly 
greater than zero. An item that every subject passes or fails has no differ¬ 
entiating and, consequently, no test value. An exception to this is the few easy 
items inserted at the beginning of a test so that the subject will build up self- 
confidence. 

E. MULTIPLE CORRELATION (R) 

Multiple correlation is a statistical technique that makes it possible to 
determine the correlation between two or more variables taken together and 
a single variable—^for example, the correlation between several tests and a 
criterion of proficiency or accomplishment. Multiple correlation not only 
yields an over-all single coefficient (symbolized by B) but is valuable for 
determining the effectiveness with which a battery of tests can predict the 
criterion. The technique is also employed to weight each test in a battery 
according to its efficiency in this respect. 

Although the computations necessary for a multiple correlation problem 
involving more than three variables are considerable and beyond the scope 
of this book,* the essence of the technique can be illustrated by a three-vari¬ 
able problem which requires only relatively simple statistical procedures. 

Predicting Academic Success from Two Variables 

On entering college, students were given a scholastic aptitude test (vari¬ 
able x) and a test of “social intelligence” (variable y). At the end of their 
sophomore year, criteria of academic success were obtained in terms of each 
student’s average grade for the first two years (variable c, the criterion). The 
correlation, Tc*, between scholastic aptitude test results and academic success 
was .60. The correlation, Pej,, between scores on the social intelligence test 
and academic success was .40. The efficiency of prediction for a correlation 
of .60 is 20% (see Table 16:2), but only 8% for a correlation of .40. 

The combined predictive efficiency of the two tests cannot be obtained 
simply by summing the correlation coefficients, averaging them, and using 
this average for an efficiency index. Whether or not the efficiency of prediction 
is greater when the two test variables x and y are taken together than when 
each is considered separately depends upon their relationships with the cri¬ 
terion as well as between themselves. That this is the case will be readily seen 

* Cf. C. C. Peters and W. R. Van Voorhis, Statistical Procedures and Their Mathematical 
Bases, McGraw-Hill, New York, 1940, chap. 8. 



MULTIPLE CORRELATION 


483 


if we cite an extreme situation. If the correlation between variable x and the 
criterion were 1.00, and that between variable y and the criterion were also 
1.00, the second test obviously would not increase the predictive efficiency 
of the first. Conversely, if each variable correlated zero with a criterion, the 
predictive efficiency of the two variables taken together would remain zero. 
The predictive possibilities between these extremes of perfect and zero 
correlation are varied. 

The multiple correlation of two variables with a criterion is computed by 
the following formula: 


Rc 




' + f^cy^ — 2rca 


1 - /*x. 


[17:6: 

Multiple correlation of 
two variables, x and y, 
with a criterion, c 


where Rc-xy symbolizes the multiple correlation of the test variables x and y 
with c, the criterion variable; rex is the correlation of x with the criterion; Pey 
is the correlation of y with the criterion, and r^y is the correlation between 
the two test variables.* 

In computing the multiple correlation for the data cited above, we already 
have the values of Fcx and The only additional information needed is the 
correlation between tlie two test variables themselves, which was .50. To 
summarize: 


Pex — .60 (correlation of scholastic aptitude test scores with the grade criteria of 
academic success) 

Pey — .40 (correlation of the social intelligence test scores with the criterion) 

Pxy = .50 (correlation of scholastic aptitude test scores with the social intelligence 
test scores) 


Substituting these three values in the above formula for R gives the following 
multiple correlation coeffuaent: 

+ .V3755 ..« 

This multiple correlation coefficient is not significantly different from .60, 
the correlation between the scholastic aptitude test and the criterion. The 
efficiency of prediction of the multiple R is 21%, as compared with 20% for 
That the addition of the y variable, the “ social intelligence ” test scores, makes 
no appreciable difference in the predictive efficiency of the battery as a whole 
may be somewhat surprising. 

Ideally, an effective battery of tests for predicting a criterion such as 
academic or vocational success would be composed of several tests, each of 
which correlates fairly high with the criterion but hardly at all with the others. 
The greater the intercorrelation of the variables in a battery of tests, the 
smaller the increase in the predictive efficiency of the battery as a whole. 


See Table V, Appendix II, for values of 1 — r*. 



484 CORRELATION METHODS FOR EVALUATING PSYCHOLOGICAL TESTS 


Predicting Clerical Efficiency from Two Variables 

We shall now present a three-variable multiple correlation problem in which 
the predictive efficiency of two test variables combined is appreciably greater 
than when either is taken alone. The criterion variable was a measure of cleri¬ 
cal proficiency based upon ratings over a period of several months. The test 
variables were results on a number-name-checking clerical test (x), and a 
vocabulary-arithmetic test of the omnibus type (y), administered as part of 
a battery for predicting clerical aptitude. The following correlations were 
obtained for these three variables: 

rex = -40 (predictive eflSciency = 8%) 

Fey = .50 (predictive efficiency = 13%) 

Txy = .10 

The multiple R between the two test variables and the criterion is .61, com¬ 
puted as follows: 

Rfxy ~ 

Whereas the higher of the correlations between the test variables and the 
criterion is .50(rc„), the multiple R is .61. For Fcx the efficiency of prediction 
is 8%, and for rcy it is 13%, whereas for Rc.xy ii is 21% (Table 16:2). Thus, the 
efficiency of prediction of the combined x and y variables is nearly twice as 
great as it is for variable (y) alone, which correlated .50 with the criterion. The 
two tests combined are more efficient because the correlation between them 
was only .10, i.e., little more than zero, and each had a fair correlation with 
the criterion. 

The Multiple Regression Equation and the Standard Error 
of Estimate for R 

A multiple correlation coefficient not only is valuable for determining the 
efficiency of prediction of a battery of tests, but is also of considerable sig¬ 
nificance in providing an empirical basis for appropriately weighting each 
test. In fact, if the battery is to have the predictive efficiency signified by /?, 
each test must be weighted on the basis of the multiple correlation. Thus, if 
a battery is composed of two tests and Test x has twice the predictive effi¬ 
ciency of Test y, then x should be given twice as much weight as y in the 
total score for the battery. The weights required can be directly obtained 
from the regression equation of the multiple correlation coefficient, provided 
the equation is expressed in z score form. For a three-variable problem, such 
as that in the preceding examples, the regression equation is as follows: 

[17:7] 

2 ^ = reyFxy Fey rexPxy ^ Multiple regression 

1 — Fxy^ * 1 — Fxy^ ^ equation for a three- 

variable problem 




PARTIAL CORREUTION 


485 


For the preceding example, this is: 


.40 - (.50)(.10) .50 - (.40)(.10) . 

1 - (. 10)2 + 1 ,_ ( 10)2 
~ (SSZx 


The regression coelficieiit for the x variable is .35, as compared with .46 for 
the y variable. Therefore, to obtain the most efficient single total score for a 
battery consisting of these two tests, scores on Test x should be given about 
three-fourths as much weight as scores on Test y, since .35/.46 = 3/4. 

The error of prediction, however, must also be considered. The standard 
error of estimate for R is the same as the standard error of estimate for the 
correlation between two variables: 

[17:8] 


1,2,3,4... n 


0‘cV^l — iRc.l,2,3,4...n® 


Standard error of esti¬ 
mate of a multiple cor¬ 
relation coefficient 


where Re . 1,2,3,4... n is the multiple correlation between any number of variables 
and the criterion, c. When R is .61, as in the preceding examples, the standard 
error of estimate is 79% as large as the standard deviation of the distribution 
of criterion scores, <Tc. Hence the efficiency of prediction is 21%. (See Table 16:2.) 


F. PARTIAL CORRELATION 

Partial correlation is a statistical technique which, in appropriate circum¬ 
stances, can yield information otherwise obtainable only by experimental 
methods. The tecluiique is based upon the assumption that the effect of a 
variable on a bi-variate relationship can be held constant. This assumption 
implies that the variable represents a relatively unitary function or trait. 
Because such an assumption holds only rarely for psychological test variables, 
the use of the technique is limited. But there is one variable, age^ which has 
been assumed to be legitimately subject to the technique. Psychologically, 
the age of individuals provides an index, but a very rough one of maturity. 

Partial Correlation with Scholastic Aptitude Held Constant 

To illustrate the technique of partial correlation for a three-variable prob¬ 
lem in which one variable is to be held constant, we shall use the multiple R 
of the scholastic aptitude scores (x) and “social intelligence” test scores (y) 
with the grade criteria of academic success (c). These three variables were 
correlated as follows: 

rex = .60 Pxy = .50 

Tcy = .40 Rc-xy = .61 

We saw above that the addition of the y variable did not appreciably increase 
the predictive efficiency. We can reverse the procedure and, by partial corre¬ 
lation, estimate the correlation between the “social intelligence” test scores 



486 CORRELATION METHODS FOR EVALUATING PSYCHOLOGICAL TESTS 


(y) and academic success, (rcy), with the effect of the scholastic aptitude test 
results (x) held constant. We can estimate what it would be from the follow¬ 


ing formula: * 



[17:9] 
Partial correlation co¬ 
efficient for a three- 
variable problem 


The correlation between academic success and social intelligence, with the 
scholastic aptitude variable held constant, is as follows: 


.40 - (.60) (.50) 

^1 - (.60)2\/l - (.50)2 


The correlation between the social intelligence test results and academic 
success is reduced from .40 to .14 when the effect of the scholastic aptitude 
test results is held constant. This was of course expected in view of the mul¬ 
tiple correlational analysis already made. The partial correlation coefficient 
shows that the residue of factors in the “social intelligence” test, beyond 
those involved in the relationship between the scholastic aptitude test and the 
criterion, amounts to very little. Social intelligence may have some relation 
to academic success, but this particular test, as standardized and scored, 
shows that the relationship is not of much consequence. 


Partial Correlation with Age Held Constant 

That there is a relationship between test aehievement and age is well 
known. Unfortunately, however, this fact is sometimes neglected in the corre¬ 
lations between psychological variables that are reported. Consider, for 
example, a correlation of .50 between vocabulary test scores (variable x) and 
arithmetic test scores (variable y) for a sample of boys and girls ranging in 
age from 9 to 12 years. At least part of the correlation is doubtless attribut¬ 
able to this heterogeneity in age. That is, because of their greater maturity 
the older boys and girls should accomplish more than the younger cliildren. 
These differences in age should therefore increase the correlation. 

The basic problem is that of estimating the correlation between vocabulary 
and arithmetic ability if the sample were all the same age. There are two 
approaches to this problem. The better procedure consists in using samples 
that are more homogeneous in age. Thus, four or five samples might be used 
in order to show the correlation between vocabulary and arithmetic ability 
for 12-year-olds, for 11-year-olds, for 10-year-olds, etc. When this approach 
is inconvenient, as it often is, an estimate of the correlation, if all the children 
had been of the same age, can be obtained by means of partial correlation. 

* The terms in the denominator are coefficients of alienation, k. Hence Table V, Ap¬ 
pendix B, considerably simplifies the computations. 



PARTIAL CORRELATION 


487 


In other words, an estimate of the correlation between x (vocabulary test) 
and y (arithmetic test) can be obtained with a (age) held constant. The data 
are as follows: 

rxy = .50 (correlation between vocabulary test scores and arithmetic test scores 
for the heterogeneous age group) 

Fxa = .65 (correlation between vocabulary test scores and age) [of subjects] 

Pya = .55 (correlation between arithmetic test scores and age) 

When age is not held constant, the correlation between x and y is .50. But 
when it is held constant, the correlation is considerably reduced: 

^ __ ^xy PxgPya _ (.50) (.65) (.55) ^ 

Vl - - ry} Vl - (.65yVl - (.5^2 

Thus, the correlation between vocabulary and arithmetic ability would more 
likely have been about .20 instead of .50 if the subjects had been of the same 
age. 


Spurious Correlation 

The difference between partial correlation coefficients and original corre¬ 
lation coefficients (.22 and .50, respectively, in the preceding example) has 
sometimes been held to illustrate the difference between non-spurious and 
spurious correlation. However, to call the correlation of .50 spurious is mis¬ 
leading. Instead, the effect of age on the result should be ascertained and the 
partial coefficient interpreted accordingly. After all, age has a very real 
effect on the correlation between vocabulary and arithmetic ability; there 
is nothing spurious about it. The real problem involves determining the 
effect of such a factor, and for this the technique of partial correlation pro¬ 
vides a statistical short-cut. The introduction of experimental techniques in 
the original design of an investigation gives a sounder basis for determining 
the role of otherwise “hidden” factors that may make the correlation coeffi¬ 
cient between two variables higher than it otherwise would be. 

EXERCISES 

1. What statistical considerations enter into the evaluation of a psychological test? 

2. Define test reliability and describe the methods used to determine it, indicating 
the relative advantages and disadvantages of each. 

3. Define test validity, distinguish between the operational and functional validity 
of a test, and describe methods used to obtain indexes of validity. 

4. How does a critical score replace a regression equation in the use of test results 
for predicting success and failure? 

5. Describe how a test may have more than one index of validity. 



488 CORRELATION METHODS FOR EVALUATING PSYCHOLOGICAL TESTS 

6. What is the effect of the range of ability of the sample on (a) the reliability coef¬ 
ficient, and (b) an index of validity? 

7. What statistical techniques are used in test item analysis? Describe three situa¬ 
tions in which different statistical techniques are employed. 

8. What does the multiple correlation coefficient measure, and how is multiple corre¬ 
lation utilized in evaluating a battery of tests? 

9. What is a partial correlation coefficient? Of what value is the technique of partial 
correlation in research? 



CHAPTER 18 


Cluster and Factor Analysis 


A. THEORY OF THE ORGANIZATION OF HUMAN TRAITS 

Statistical methods of cluster or factor analysis represent a significant de¬ 
velopment during the past several decades in the appraisal and evaluation 
of psychometric procedures. We have seen that multiple correlation makes 
it possible to determine the predictive efficiency of a battery of tests and to 
weight each test on the basis of its efficiency for predicting a criterion. Factor 
analysis, on the other hand, gives a basis for insight into the organizational 
role of the traits, abilities, etc., which enter into performance on a series of 
tests. This is not to say that multiple correlation is not useful for evaluating 
a test battery; on the contrary, a most significant step in evaluating any test 
of ability or aptitude is the determination of its functional validity in the 
light of an independent empirical criterion. From an empirical point of view, 
the latter is just as important as a factor analysis of the abilities operating in 
a battery of tests. Nevertheless, a picture of the way in which abilities are 
organized provides valuable insights for the construction and administration 
of tests. 

The Coefficient of Determination (r^) 

The principle underlying factor analysis is the association of component 
factors in two or more correlated variables. For example, when the correla¬ 
tion of two variables is significantly greater than zero, the non-chance factors 
that account for the correlation are common to both. Thus, the correlation 
between weight and height is accounted for by factors of organic develop¬ 
ment common to both of these variables. Similarly, the correlation between 
scholastic aptitude and academic achievement is accounted for by factors 
of intellectual development which are common to both of these two variables. 

Correlations significantly greater than zero can be described as causal rela¬ 
tionships provided there is a logical basis for the association. Causation itself, 
however, is complex. For one thing, the determining factors underlying a 
causal relationship are rarely, if ever, simple. Furthermore, the factors 
accountable for a causal relation between two variables, x and y, may oper¬ 
ate mainly in one direction, or in several directions: (1) y may be a function 
of x; {2) X may be a function of y; (3) both may be a function, reciprocally, of 
each other; or (4) both may be a function of a third set of factors. The fourth 
is illustrated by the correlation between vocabulary and arithmetic ability of 
children heterogeneous in age, referred to in Chapter 17. 

489 





490 


CLUSTER AND FAaOR ANALYSIS 


The proportion or percentage of the variance (standard deviation squared) 
of one variable that is associated with the variance of another variable can be 
estimated from a coefficient that is usually called the coefficient of determina¬ 
tion^ and is the square of the correlation between the two variables, 

If the correlation between scholastic aptitude (x) and academic achievement 
(y) for a given universe is .60, then (.60),* or .36, is the proportion of the 
variance of x that is associated with the variance of y. If academic achievement 
could be logically assumed to be a function of scholastic aptitude, and not 
vice versa, then 36% of the variance characteristic of academic achievement 
would be determined by factors measured by the scholastic aptitude test. 
Generally, however, such an assumption of causality from one variable to 
another is hazardous. Consequently, it is preferable to interpret a coefficient 
of determination as measuring the proportion or percentage of factors that 
the two variables have in common. Thus, there is a perfect correlation between 
the circumferences and diameters of circles; therefore equals 1.00, and 100% 
of the variance of either variable is associated with, or a function of, tlie 
variance of the other. Both circumference and diameter are properties of 
“circularity”; neither circumference nor diameter is the causal determinant 
of the other. The association is invariant, i.e., r is 1.00, but both variables 
are “determined” by the character of the whole of which they are integral 
aspects or properties. 

The Coefficient of Non-Determination 

The proportion or percentage of the variance of one variable not accounted 
for by a second variable that is correlated with it to some degree is measured 
by the coefficient of non-determination. This is the square of fe, the coeffi¬ 
cient of alienation.f 


Spearman’s Two-Factor Theory 

Interest in how psychological abilities may be organized dates back to Spear¬ 
man’s early work in England.^ Spearman advanced a theory which came to be 
known as the two-factor theory, and which held that all human abilities are 
basically dependent upon (1) a factor of general mental energy and (2) abili¬ 
ties specific to each kind of task situation. The general factor he symbolized 
as G, and the second, being pluralistic, he symbolized as for specific factors. 
He developed the thesis that the extent to which an individual manifests the 
general factor, G, is a function of his heredity, whereas the specific factors, 
Su ^ 2 , « 3 ... 5n, represent his specific acquisition of learning and experience. 


* See Table V, Appendix B, for r* for given values of r. 
t See Table V, Appendix B, for 1 — r®, or for given values of r. 
jC. Spearman. “The Proof and Measurement of Association Between Two Things,” 
American Journal of Psychology, 15:72-101, 1904. See also his The Abilities of Man, Mac¬ 
millan, New York, 1927, and Psychology Through the Ages, Macmillan, New York, 1938. 



THEORY OF THE ORGANIZATION OF HUAAAN TRAITS 


491 


Spearmw defined intelligence in terms of G, and consequently encouraged the 
development of intelligence tests which would differentiate an individual’s 
abilities with respect to his G capacity. He believed that the following three 
psychological functions were most directly indicative of G: (1) introspective 
capacity; (2) the eduction of relations, and (3) the eduction of correlates. 
He further contended that the so-called general intelligence tests measure 
the latter two functions fairly well but do not adequately measure the iBrst. 
In papers presented over a period of years, Spearman and his students at¬ 
tempted to educe empirical evidence in support of the two-factor theory. 

Multiple-Factor Theories 

As additional techniques of statistical analysis were developed in the 
United States by Kelley,* Hotelling,f and Thurstone,! Spearman’s two-factor 
theory was found to be increasingly unsatisfactory as an explanation of the 
organization of human abilities. In fact, Spearman himself, in later works, 
recognized that a unitary G factor would not entirely account for the way in 
which human abilities manifest themselves. The outcome of his own work, 
as well as that of American investigators, was the development of a multiple- 
factor theory which postulated group factors in addition to a possible general 
factor and specific factors. Group factors are psychological functions common 
to a number of behavior situations but not to all. The general factor, G (we 
prefer the term common factors)^ represents psychological functions common 
to all situations that demand mental activities. The specific factors, s, repre¬ 
sent psychological functions peculiar to a particular situation. Whether or 
not these three types of factors represent innate or acquired abilities is of 
course beside the point here, for the present discussion is concerned only 
with the organization and interrelation of psychological functions or “factors.” 

Sampling Theory and Cluster Analysis 

Thomson § and Tryon,|| following the lead of E. L. Thorndike, have con¬ 
tended that the fundamental functions or factors underlying human be¬ 
havior are practically infinite in number and relatively independent of each 
other. The problem in connection with a group of tests is to determine how 
this myriad of factors is sampled, or drawn upon, in the functioning of be¬ 
havior. Cluster analysis, a statistical technique devised by Tryon, enables 

♦ T. L. Kelley, Cross Roads in the Mind of Man, Stanford Univ. Press, Stanford Uni¬ 
versity, 1928. 

tH. Hotelling, “Analysis of a Complex of Statistical Variables into Principal Com¬ 
ponents,” Journal of Educational Psychology, 24:417-441, 498-520, 1933. 

t L. L. Thurstone, Vectors of the Mind, Univ. of Chicago Press, Chicago, 1935; and also. 
Primary Mental Abilities, Psychometrika Monograph, 1938. 

§ G. H. Thomson, The Factorial Analysis of Human Ability, Houghton Mifflin, Boston, 
1939. 

IIR. C. Tryon, Cluster Analysis, Correlation Profile and Orthometric {Factor) Analysis for 
the Isolation of Unities in Mind and Personality, Edwards Bros., Ann Arbor, 1939. 



492 


CLUSTER AND FAaOR ANALYSIS 


the investigator to determine how psychological functions such as abilities, 
attitudes, etc., manifest themselves and are interrelated and organized. 

Once the facts concerning the organization of psychological functions are 
ascertained, the theoretical and practical implications for the development 
and use of psychological tests will be clear. However, not all the facts are 
known,* and therefore factorial analysis should be viewed as a useful statistical 
technique for the evaluation and appraisal of test batteries as they are 
developed. 

These various theories of mental organization have practical implications 
for testing procedures. Thus, if Spearman’s two-factor theory were adequate, 
psychologists should concentrate on developing a battery of tests that would 
give as reliable and valid a differentiation of G as is possible. For example, 
some investigators have used the I.Q. as if it were the answer. 

Thurstone speaks of primary mental factors such as memory, word relations, 
number relations, etc. A well-rounded test battery designed to measure such 
functions should include at least one reliable test for measuring each one. 
Furthermore, by the theory of multiple factors, the results of each test in 
such a battery would not be pooled in order to obtain a composite single score 
but would be kept separate so as to yield a measure for each function. Never¬ 
theless, there is still the question of the practical implications of such a battery 
for educational and occupational situations. As has been indicated, the 
practical value must be determined through the evaluation of test results 
in relation to independent criteria of proficiency or success in the life situation. 

B. METHODS OF FACTOR ANALYSIS 

The methods of factor analysis which have been used generally in the 
United States during the past decade are principally those devised by Thurs¬ 
tone t and Hotelling.t However, they require complicated mathematical com¬ 
putations and are beyond the scope of this book. Fortunately, correlatwn 
profile analysis, a relatively simple technique developed by Tryon,§ makes 
it possible to determine whether the intercorrelations between a group of 
variables can be explained more satisfactorily in terms of a set of functions 
common to all of them or in terms of two or more sets of functions (group 
factors), or whether neither of these explanations is adecjuate. 

Tryon’s Method of Correlation Profile Analysis 

Tryon’s approach to the organization of human abilities, traits, etc., intro¬ 
duces such concepts as “components,” “operational unities,” and “clusters.” 
He emphasizes that all a research worker can achieve is to “discover general 

* Cf. E. E. Gureton, **The Principal Compulsions of Factor-Analysts,*’ Harvard Educa¬ 
tional Review^ 9:287-'295,1939. 

t L. L. Thurstone, op. cit, % H. Hotelling, op. eU, § R. G. Tryon, op. eU» 



METHODS OF FACTOR ANALYSIS 


493 


components which act as if they were common determiners in different be¬ 
haviors. Such common determiners are called operational unities. These are 
defined as those component factors which operate when two or more variables 
show the same pattern of correlation coefficients with all the other variables 
of a group. Two variables, A and B, are said to be wholly or partially deter¬ 
mined by an operational unity if both correlate highly with variable M, low 
with iV, intermediate with 0, and so on throughout the other variables. In 
such a case, clearly what is common to A is common to B, since they are 
behaving in an identical and unitary fashion. Correlation profile analysis is a 
simple method for discovering and grouping together variables which have 
identical patterns or profiles of correlations.” * 

Cluster Analysis of Body Measurements 

The intercorrelation coefficients between the following 12 variables, meas¬ 
ures of body dimensions, will be used to illustrate Tryon’s correlation profile 
analysis: f 


A. Waist height 

G. Bitrochanteric diameter 

B. Hip height 

H. Waist girth 

C. Weight 

/. Hip girth 

D. Stature 

J. Upper arm girth 

B. Cervical height 

K. Posterior arm length 

F. Tibial height 

L. Thigh girth 


Body measurements for each of these 12 variables were obtained from 
32,165 boys aged 4 to 14, and from 31,919 girls of a similar age range; only 
the data for the boys are presented here. At least part of the correlation 
between any two of these variables will be due to the sample’s heterogeneity 
in age. Since the correlation of each variable with age was included in the 
original article, we have been able to hold constant the effect of age by means 
of partial correlation. 

The 66 intercorrelations of these 12 variables are presented in Table 18:1. 
The intercorrelations above the diagonal are the coefficients presented in the 
original article, with the effect of age heterogeneity included; those below 
the diagonal are the partial coefficients with the effect of age held constant. 

An examination of this table may not be particularly informative at first. 
It will be noted, however, that the coefficients below the diagonal are gen¬ 
erally lower than those above the diagonal. Furthermore, closer inspection of 
the latter coeflScients reveals that the correlations between measurement and 
age (the first row) are generally lower than the intercorrelations among the 
12 variables themselves, and that all the correlations are positive and fairly 

* Ibid., p. 2, n. 

t The data for this analysis were obtained from a study, “ Children’s Body Me€fc8urementa 
for Sizing Garments and Patterns,” made by R. O’Brien and M. A. Girschick, U. S. De¬ 
partment of Agriculture, Miscellaneous Publications No. 36, 1939. 



« 

o» 

S < 


& ^ 

* 

X 4) 

£ « 

a 

2 5 


O -S a 

M- O c 

® "5 a 

M 2. o> 

‘S C3 

I ^35 

P a Jc 


S 

rr! 8 

00 u 



'O ''tCON.flOOO'O — Q’^'Ot- 
CO CMOCOMCMOCIO'OCOCN 

K. eocoo^ooooooo^o^c^^eo 


K •^co’^co^OKweo^ 
00 NK*-KK<OOeo&»K 
00 O^O^ChO^O^O^O^lS^OOKk 


00 ^OC4<OtOOOO»(S 
00 'O«Oi-KK»O00KOl 
<> NKOiKkNoqoqOi 


K 0^-»0’^'^KCOC>* (OOOm 
OO-OOKOOKKO *0 CN 

00 00000^(^0000^0^ oooc^ 


O K'^rtescoK'o 
K'OOOOOO'OK 
>0 KKOKNlsoq 


o o o 

CM K O O 
00 K v) 00 


«o ok-^^ooooo ooo^eoo^o^ 

CM 00«*'0.^»-0‘ ^•—M>M>»- 
00 O 00 O^ 0> O 00 K (MV *0 00 


00 K CM P O'M-'OO'OOCM 


00 0« O OO 0» C^ <0'<« to'M'00 


00 — ^ O « M> o CM 00 CM CM o 
O»O^00CMO^ K>->-00O»N« 
00 O^OO^C^ OONIO'O'^OOIO 


K O *0 K O* CM — (^ o lO 00 CM 

0> 00 CM rv N •- O (> (V M) o 

00 0*0*0* (> 00 K lO M>00 lO 


CM lo^ ■^(VOOO'OOOOOrOO 
CM »- O lo N- 'M- K •- CM CO O M) 
00 0*0* K K M> 00 00 00 N 00 


O — O CM K 'M' CM <0lo 

O O* '4* CM CM Oj — lo O — K 


O* CM K 00 CO ^ CM lo O CM M> < 

o lo olo a* K »© (► K ' 
00 0;K^o^co!o'«m>'^ooi 


rmm 

► a > .-5 ;S i : 


5.9-^ p 52 .15.9-S 

0 >X>(^(J(=S>X3C 
< < 00 u d »*i •*-’ C) X S 
494 


















METHODS OF FAaOR ANALYSIS 


495 


high. Tryon’s correlation profile analysis will give a graphic picture of the 
interrelationships of these variables. We shall of course use the partial coeflS- 
cients below the diagonal. 

The Correlation Profile 

Fig. 18:1 is a correlation profile which brings together the relationships 
among the 12 body measurements. The graph is constructed by plotting in 
succession the correlation of each variable with all the other variables. The 


Fig. 18:1. Correlation Profiles for Twelve Body Measurement Variables. (Data from 

Table 18:1) 



value of each coefficient is plotted on the ordinate, and the 12 variables, A to 
L, are scaled in equally spaced intervals on the abscissa. There are thus 
12 line graphs in the figure, one for each variable. Each line graph is con¬ 
tinuous, except where the variable is correlated with itself; self-correlation is 
the reliability coefficient and is not shown.* 


* Tryon's method is developed on the assumption that the reliability of the method used 
for measuring each variable is high, and that at least one of the correlations of each variable 
with the others is significantly greater than zero. In the intercorrelations in Fig. 18:1, all 
the coefficients are significantly greater than zero. 



496 


CLUSTER AND FAaOR ANALYSIS 


I 


Ordinarily, in correlation profile analysis, the profiles of all the variables 
are not drawn on a single graph; rather, a systematic method for analyzing 
the table of intercorrelations is used which permits the investigator to plot 
on separate graphs each group of variables most likely to cluster together.* 
However, when only 12 variables are used, the correlation profile of each can 
be drawn on one graph. Fig. 18:1 shows which of the variables, if any, form one 
or more operational unities. 

The implicaitions of the correlation profile in this figure are rather clear. 
It will be observed that the following six variables (shown by solid lines in 
the figure) have similar correlation profiles, and hence constitute an opera¬ 
tional unity which we shall call Cluster I: 

Cluster I 

A. Waist height 

B. Hip height 

D. Stature 

E. Cervical height 

F. Tibial height 

K, Posterior arm length 

The trend of the correlations between each of these six variables and all of 
the variables is similar; that is, all the line graphs for these six rise and fall 
together. 

The remaining six variables also have correlation profiles similar to each 
other, as shown by the broken lines in the figure; we shall call these six 
Cluster II. 

Cluster II 

C. Weight 

G. Bitrochanteric diameter 

H. Waist girth 

/. Hip girth 

J. Upper arm girth 

L. Thigh girth 

The curves of these six variables rise and fall together, even though the 
actual coeflBcients are not as similar as were those in Cluster I. However, the 
six correlation profiles for Cluster II are not only similar but, in contrast to 
those in Cluster I, have a diffei^nt pattern and hence provide the basis for a 
second operational unity. 

Examination of the variaUes in Cluster I reveals that all are measures of 
length, whereas all those in Cluster II are measures of volume or girth. This 
empirical result supports the hypothesis that at least two significantly different 
physical dimensions—^length and volume or girth—^must be taken into ac¬ 
count in measurements of body build* 


* R. C. Tryon, op. eiL, pp. 4-6. 



METHODS OF FAaOR ANALYSIS 


497 


The line graphs in Fig. 18:1 are thus evidence for the existence of two major 
operational unities among the 12 variables. But other implications are evi¬ 
dent from further inspection of these profiles. For example, the correlation 
profiles of Cluster I are more congruent than are those of Cluster II. This is 
consistent, of course, with the information on body development and body 
build revealed by many independent investigations: Growing children, as 
well as adults, are more variable with respect to measures of volume and 
girth than to measures of length. That the six volume or girth variables 
signify greater relative variability among themselves than the six length 
variables is evidenced by the greater range of their intercorrelations with any 
single variable. 

The correlation profiles also suggest (although they do not necessarily 
demonstrate) which variable should provide the best single measure of each 
cluster. In each case, they will be the variables which are at the peak in their 
respective correlation profiles. The variable which will provide the best 
measure of the length functions measured by Cluster I as a whole is A (waist 
height), B (hip height), D (stature), or E (cervical height). From only an 
inspection of the correlation profiles, there is not much basis for selecting one 
of these four, for any of them apparently represents the cluster as well as the 
other three. The average of the intercorrelations of each of these four vari¬ 
ables with the remaining five in the cluster is about the same, approximately 
.90. The one best variable could be determined by means of more complicated 
mathematical methods; * but from a practical point of view, the variable that 
can be most readily and reliably obtained would be chosen. Since stature is 
the most practical of the four for ordinary measurement, it would be the one 
to be used. 

The choice among the six variables in Cluster II would lie between C 
(weight) and I (hip girth). These two have the highest average intercorrela¬ 
tions with the other variables in the cluster, although L (thigh girth) also is 
fairly high, with an average intercorrelation of about .85. A mathematical 
factor analysis will reveal whether there are any significant differences in the 
predictive efficiency of one of these variables over the others; but since weight 
is most readily attainable in ordinary situations, it would be chosen. 

The application of correlation profile analysis to the intercorrelations of the 
12 variables thus not only reveals two significantly different operational 
unities or clusters, but also suggests that from a practical point of view 
stature and weight best represent each operational unity in Clusters I and II 
respectively. 

The intercorrelations have a further implication, viz., that all 12 variables 
measure certain factors or functions common to all 12. This is evidenced by 
the fact that all the intercorrelation coefficients are positive and fairly high. 


Tryon’s orthometric analysis serves this purpose; such an analysis could also be made 
by other methods of factor analysis. 



498 


CLUSTER AND FAaOR ANALYSIS 


An inspection of Fig. 18:1 indicates that, on the average, the six variables in 
Cluster I have correlations of from about .50 to about .70 with the six vari¬ 
ables in Cluster II, and that the six variables in Cluster II have average 
correlations of from about .50 to about .65 with the six variables in Cluster I. 
In other words, the implication is clear that there is a communality of func¬ 
tions for all 12 variables; this is consistent with the fact that ail of them repre¬ 
sent measures of body build. Although two important physical dimensions are 
differentiable, nevertheless organic unity underlies the interrelations of all 
these variables. 

In summary, the correlation profile analysis indicates that the various 
factors underlying these measurements are interrelated and organized as 
follows: 

Common Factors: Factors or functions common to all 12 measurements of body 
build. 

Length Factors: Cluster I—factors or functions common to all the measures of 
length, but not to measures of volume or girth. 

Volume Factors: Cluster II—factors or functions common to all measures of 
volume or girth, but not to measures of length. 

Specific Factors: Factors or functions common to only a particular variable but 
not to any other. 

With respect to the organization of mental functions rather than body 
measures, the common factors would be somewhat analogous to Spearman’s 
G factor; the length and volume factors would be analogous to the group 
factors of multiple-factor theory; and the specific factors would be analogous 
to those referred to by Spearman, However, this analysis is not based on the 
rigid preconceptions of any theory; rather, it represents an empirical approach 
to the problem of functional organization. 

Cluster Analysis of Psychological Variables 

It will be well to illustrate the application of cluster analysis to a series of 
psychological variables, for the results will not be so unambiguous in their 
implications as were those obtained with the body measurements discussed 
above. For this purpose intercorrelations between ten achievement tests ad¬ 
ministered to 1046 Bucknell College sophomores will be used. 

The data presented in Table 18:2 represent the interrelationships of stu¬ 
dent abilities on a comprehensive survey of achievements in general culture, 
science, English, and mathematics. The reliability of each test was satisfac¬ 
tory; the lowest was .90 (for the grammar test), the highest .986 (for mathe¬ 
matics). Interpretation of the intercorrelations in this table is somewhat more 
complicated than was true of the body measurement data in Table 18:1. 
The achievement test intercorrelations include a few negative coefficients and 
many correlations not significantly greater than zero. The punctuation test 
and the grammar test have the highest coefficient, ;742. 




1 

CNOOCO'^tK — K^flO^ 
<*)COeOV)IOOC4«IOO^ 

. l‘ ‘ ■ 

1 

'^MO^'^KCOa^CO 

‘ 1 ' ' 

x 

o- — «o o^»n — o 

oor4p-'^eoco»oi«o*'^ 
CO'^'<‘‘OOOCmOS'0'0 

.r ’ 

o 

'^•oocaoeo^iocoK 

O <0 es Os CM *>. cs 


<NCO'^OCM'00«0»OCN 

CN — t-cooSpcoeoK*- 
OOOOOOs^OOO 

1 Mi 

1 

'<CS — OO’^CMCNO^*-'^ 
O^K'Oeo^OOCO'^K 
cO'^'^'O^pcM'Oioin 

1 

^K<<>>oaoO'^oo'^*o 

^tOV)^*OOCOtO'«iO 

1 

.621 

.742 

.907 

.503 

.461 

.014 

.102 

.411 

.407 

.339 

oa 

'O § K K ^ *0 CM <S 00 

lo ^ lo p P 

B 

23S)cmn.o^c<I'^oO'4’CM 

osto^'^rcoooeococo 

‘ ‘ ’ 1. 

1 

• s 


A. Spelling 

B. Grammar 

C. Punctuotion 

D. Vocabulary 
^ E. Literature 

'O F. Mathematics 

G. General scienc 

H. Foreign literah 
f. Rne arts 

J. History 


hi 












500 


CLUSTER AND FACTOR ANALYSIS 


The chief implication to be drawn from this tcible is perhaps that no im¬ 
portant set of psychological functions is common to all 10 variables. In 
other words, the slightly negative and zero correlation coefficients suggest 
the absence of Spearman’s G function or of any other defined communality 
for the battery of tests as a whole. This table, even more than the preceding 
table, emphasizes the need for statistical techniques which will enable the 
investigator to ascertain whether the abilities represented by a group of vari¬ 
ables have anything in common—whether they are interrelated and organized. 


Fig. 18:2. Correlation Profiles for Ten Achievement Test Variables. (Data from 

Table 18:2) 



ABCDEFGHI J 
Test Variables 


Correlation profiles for each of these 10 variables are presented in Fig. 18:2. 
Although the interpretation of these results is more complicated than was 
true of those in Fig. 18:1, close scrutiny should yield relevant hypotheses 
about the psychological functions involved. Thus, several fairly distinct 
operational unities apx)ear to be characteristic of the 10 variables. The first con¬ 
sists of A (spelling), B (grammar), and C (punctuation). This group can be 
called Cluster I. A second possible operational unity apparently includes H 
(foreign literature), / (fine arts), J (history), nnd possibly D (vocabulary) and 
E (literature). These will be called Cluster II. There remain Variables F 





METHODS OF FACTOR ANALYSIS 


501 


(mathematics) and G (general science). Since there is not much congruence 
between their profiles, they will be considered independent variables. If the 
test battery had included several tests of mathematical proficiency, they might 
have yielded congruent correlation profiles and would therefore have formed 
a “mathematics cluster.” Presumably the same would be true if there had 
been several tests of science ability. 

In summary: the correlation profiles in Fig. 18:2 suggest the following organ¬ 
ization of mental functions, so far as they were sampled by the 10 achieve¬ 
ment tests: 

Practical English Usage Factors: Cluster I—Psychological factors or functions 
common to A (spelling), B (grammar), and C (punctuation); an operational 
unity composed of three tests which evidently sample practical English usage. 

Literature Factors: Cluster II—Psychological factors or functions common to 
D (vocabulary), E (literature), H (foreign literature), I (fine arts), and J 
(history); an operational unity composed of five tests which evidently sample 
an understanding and apprecia.tion of literature. 

General Science Factors: Psychological factors or functions common to G (general 
science); a relatively independent test which evidently samples general science 
information. 

Mathematics Factors: Psychological factors or functions common to F (mathe¬ 
matical ability); a relatively independent test which evidently samples mathe¬ 
matics ability. 

Specific Factors S\ Psychological factors or functions common to a particular 
variable but not to any of the others— specific factors. (The correlations are far 
from perfect.) 

As already indicated, these results yield no evidence whatsoever for the 
existence of Spearman’s G factor or of any other factors or functions common 
to the battery as a whole. The correlation profile of the mathematics variable 
(F) is especially important as evidence in support of this point. The mathe¬ 
matics test results had practically a zero correlation with all the others except 
general science (G), and even here the correlation was only .43. On the other 
hand, the trend of the correlation profile of the G variable was, in some re¬ 
spects, similar to that of the other variables in Cluster II. Hence the result 
suggests that ability in general science is composed of abilities in part common 
to the mathematics variable and in part common to the other variables in 
Cluster II. Such an inference is not contrary to a common-sense appraisal of 
the characteristic content of elementary general science courses and the 
manifold abilities called for. 

From this cluster analysis the following practical implication for testing pro¬ 
cedures should be clear. It should be more useful to summarize an individual’s 
performance on the battery by differentiating his scores into clusters and 
independent variables than by pooling them to obtain a composite single 
score for the battery as a whole. Pooling them would give ambiguous indices of 
achievement because a single score would fail to reveal his relatively pro¬ 
ficient and relatively non-proficient areas. Furthermore, pooling is clearly not 



562 


CLUSTER AND FACTOR ANALYSIS 


warranted because the correlation profiles in Fig. 18:2 give no evidence of 
the existence of any important factor or set of factors common to the battery 
as a whole. 

Some General Implications of Factor Analysis 

The chief contribution to psychological measurement resulting from factor 
analyses of psychological variables during the past several decades has been 
the development of a body of knowledge and theories concerning the organi¬ 
zation and interrelations of mental abilities and other attributes. Such 
theories are no longer based on rational considerations alone; they are forti¬ 
fied by empirical data. Factor analyses of many variables derived from 
human behavior have provided an empirical foundation for test procedures. 

Although some of the earlier theories of mental organization have been 
demonstrated to be inadequate, no theory that is generally acceptable has 
yet been established. However, it is well established that many people have 
unusual capacities and attainments in some respects but not in others. In 
other words, there are important ability and personality differences within 
the average person as well as between persons. Recognition of these distinc¬ 
tions is essential to the appraisal of the abilities and aptitudes of individuals 
as a basis for adequate counseling, guidance, and placement. 


EXERCISES 

1. Define the coefficient of determination. What does it measure? 

2. What is the usefulness of cluster or factor analysis in psychological research? 

3. What fundamental principle underlies the usefulness of cluster or factor analysis? 

4. Set up a hypothetical battery of ten test variables and describe the kind of result 
that you would need to obtain from an intercorrelational analysis which would 
support: (a) Spearman’s two-factor theory, (b) the theory of group factors. 

5. Set up correlation profiles for the data in Table 18:3, and interpret the results. 




503 














APPENDIX A 


Bibliography of Statistical Tables and 
Nomographs^ Periodical Literature, and 
Chief References in Mathematical and 
Advanced Statistics 





STATISTICAL TABLES AND NOMOGRAPHS 

Butos, O. K. (ed.), The Second Yearbook of Research and Statistical Methodology^ The 
Gryphon Press, Highland Park, New Jersey, 1941. 

Chesire, L., Saihr, M., and Thurstone, L. L., Computing Diagrams for the Tetrachoric 
Correlation Coefficient, University of Chicago Bookstore, Chicago, 1933. 

Dunlap, J. W., and Kurtz, A. K., Handbook of Statistical Nomographs, Tables, and 
Formulas, World Book Company, Yonkers, 1932. 

Fisher, R. A., and Yates, F., Statistical Tables for Biological, Agricultural and Medical 
Research, Oliver & Boyd, London, 1938. 

Kendall, M. G., and Smith, B. B., “Randomness and Random Sampling Numbers,” 
Journal of the Royal Statistical Society, 101;147-166, 1938. 

Kurtz, A. K., and Edgerton, H. A., Statistical Dictionary of Terms and Symbols, 
Wiley, New York, 1939. 

Pearson, Karl, Tables for Statisticians and Biometricians, Cambridge University 
Press, Cambridge, 1914. 

PERIODICALS AND GOVERNMENT PUBLICATIONS 

Biometrika: A Journal for the Statistical Study of Biological Problems. Egon S. Pearson, 
Editor, Unive»*'^ity College, London. 

Journal of the American Statistical Association. Lester S. Kellogg, Managing Editor, 
1603 K Street, N.W., Washington 6, D. C. 

Journal of Applied Psychology. Jack Dunlap, Editor, University of Rochester, 
Rochester, New York. 

Journal of Educational Research. A. S. Barr, Chairman of Editorial Board, University 
of Wisconsin, Madison 6, Wisconsin. 

Journal of the Royal Statistical Society. 4 Portugal Street, W.C. 2, London. 

National Education Association, Publications. Washington, D. C. 

Psychometrika: A Journal Devoted to the Development of Psychology as a Quantitative 
Rational Science. H. O. Gulliksen, Managing Editor, Princeton, New Jersey. 

Public Opinion Quarterly. E. F. Goldman, Editor, Princeton University Press, Prince¬ 
ton, New Jersey. 

U. S. Bureau of the Census, Publications. Washington, D. C. 

U. S. Office of Education, Publications. Washington, D. C. 

U. S. Public Health Service, Publications. Washington, D. C. 

MATHEAAATICAL STATISTICS AND ADVANCED STATISTICAL 

METHODS 

Ezekiel, Mordecai, Methods of Correlation Analysis, John Wiley & Sons, New York, 
2nd ed., 1941. 

Fisher, R. A., The Design of Experiments, Oliver & Boyd, London, 2nd ed., 1937. 
Fisher, R. A., Statistical Methods for Research Workers, Oliver & Boyd, London, 7th ed., 

1938. 

Kelley, T. L., Statistical Method, Macmillan, New York, 1923. 

Kenney, J. F., Mathematics of Statistics, 2 vols.. Van Nostrand, New York, 1939. 
Peters, C. C., and Van Voorhis, W. R., Statistical Procedures and Their Mathematical 
Bases, McGraw-Hill, New York, 1940. 

Smith, J. G., and Duncan, A. J., Sampling Statistics and Applications, McGraw-Hill, 
New York, 1945. 

Thomson, G. H., The Factorial Analysis of Human Ability, Houghton Mifflin, Boston, 

1939. 

Yule, G. U., and Kendall, M. G., An Introduction to the Theory of Statistics, Griffin, 
London, 12th ed., 1940. 


506 



APPENDIX B 

Tables of Sfafistical Functions 


Table 

I. Areas and Ordinates of the Normal Probability Curve 508 

lA. Ordinate Values of the Normal Curve Expressed as Proportions 

of the Ordinate at the Mean 511 

II. Probability Values of T for Normal Sampling Distributions of 

Large Sample Theory 512 

III. Distribution of t for Small Sample 514 

IV. Distribution of Chi-Square 515 

V. Values of Functions of r 516 

VI. Values of Fisher’s z Function for Given Values of Pearson’s r 518 

VII. Values of Proportions p and q 519 





508 


APPENDIX B 


Table I 

AREAS AND ORDINATES OF THE NORAUL PROBABILITY CURVE 

(In Terms of x/o- Units and a Total Area (a) Equal to 1.0) 

Examples .4066 (or 40.66%) of the total area of the normal probability curve 
lies between the mean and a point 1.32 standard deviations units above or below 
the mean; i.e., xfa — 1.32. The proportionate value of the ordinate, y, at x/a 
of 1.32 is .1669. 





























































































TABLES OF STATISTICAL FUNCTIONS 
Table I (continued) 


509 


X 

<T 

Area 

(a) 

Ordinate 

y 

X 

a 

Area 

(a) 

Ordinate 

y 

X 

a 

Area 

(a) 

Ordinate 

y 

1.20 

.3849 

.1942 


.4554 

.0940 

2.20 

.4861 

.0355 

1.21 

.3869 

.1919 

1.71 

.4564 

.0925 

2.21 

.4864 

.0347 

1.22 

.3888 

.1895 

1.72 

.4573 

.0909 

2.22 

.4868 

.0339 

1.23 

.3907 

,1872 

1.73 

.4582 

.0893 

2.23 

.4871 

.0332 

1.24 

.3925 

.1849 

1.74 

.4591 

.0878 

2.24 

.4875 

.0325 

1.25 

.3944 

.1826 

1.75 

.4599 

.0863 

2.25 

.4878 

.0317 

1.26 

.3962 

.1804 

1.76 

.4608 

.0848 

2.26 

.4881 

.0310 

1.27 

.3980 

.1781 

1.77 

^616 

.0833 

2.27 

.4884 

.0303 

1.28 

.3997 

.1758 

1.78 

.4625 

.0818 

2.28 

.4887 

.0297 

1.29 

.4015 

.1736 

1.79 

.4633 

.0804 

2.29 

.4890 

.0290 

1.30 

.4032 

.1714 

1.80 

.4641 

.0790 

2.30 

.4893 

.0283 

1.31 

.4049 

.1691 

1.81 

.4649 

.0775 

2.31 

.4896 

.0277 

1.32 

.4066 

.1669 

1.82 

.4656 

.0761 

2.32 

.4898 

.0270 

1.33 

.4082 

.1647 

1.83 

.4664 

.0748 

2.33 

.4901 

.0264 

1.34 

.4099 

.1626 

1.84 

.4671 

.0734 

2.34 

.4904 

.0258 

1.35 

.4115 

.1604 

1.85 

.4678 

.0721 

2.35 

.4906 

.0252 

1.36 

.4131 

.1582 

1.86 

.4686 

.0707 

2.36 

.4909 

.0246 

1.37 

.4147 

.1561 

1.87 

.4693 

.0694 

2.37 

.4911 

.0241 

1.38 

.4162 

.1539 

1.88 

.4700 

.0681 

2.38 

.4913 

.0235 

1.39 

.4177 

.1518 

1.89 

.4706 

.0669 

2.39 

.4916 

.0229 

1.40 

.4192 

.1497 

1.90 

.4713 

.0656 

2.40 

.4918 

.0224 

1.41 

.4207 

.1476 

1.91 

.4719 

.0644 

2.41 

.4920 

.0219 

1.42 

.4222 

.1456 

1.92 

.4726 

.0632 

2.42 

.4922 

.0213 

1.43 

.4236 

.1435 

1.93 

.4732 

.0620 

2.43 

.4925 

.0208 

1.44 

.4251 

.1415 

1.94 

.4738 

.0608 

2.44 

.4927 

.0203 

1.45 

.4265 

.1394 

1.95 

.4744 

.0596 

2.45 

.4929 

.0198 

1.46 

.4279 

.1374 

1.96 

.4750 

.0584 

2.46 

.4931 

.0194 

1.47 

.4292 

.1354 

1.97 

.4756 

.0573 

2.47 

.4932 

.0189 

1.48 

.4306 

.1334 

1.98 

.4762 

.0562 

2.48 

.4934 

.0184 

1.49 

.4319 

.1315 

1.99 

.4767 

.0551 

2.49 

.4936 

.0180 

1.50 

.4332 

.1295 

2.00 

.4772 

.0540 

2.50 

.4938 

.0175 

1.51 

.4345 

.1276 

2.01 

.4778 

.0529 

2.51 

.4940 

.0171 

1.52 

.4357 

.1257 

2.02 

.4783 

.0519 

2.52 

.4941 

.0167 

1.53 

.4370 

.1238 

2.03 

.4788 

.0508 

2.53 

.4943 

.0163 

1.54 

.4382 

.1219 

2.04 

.4793 

.0498 

2.54 

.4945 

.0158 

1.55 

.4394 

.1200 

2.05 

.4798 

.0488 

2.55 

.4946 

.0154 

1.56 

.4406 

.1182 

2.06 

.4803 

.0478 

2.56 

.4948 

.0151 

1.57 

.4418 

.1163 

2.07 

.4808 

.0468 

2.57 

.4949 

.0147 

1.58 

.4430 

.1145 

2.08 

.4812 

.0459 

2.58 

.4951 

.0143 

1.59 

.4441 

.1127 

2.09 

.4817 

.0449 

2.59 

.4952 

.0139 

1.60 

.4452 

.1109 

2.10 

.4821 

.0440 

2.60 

.4953 

.0136 

1.61 

.4463 

.1092 

2.11 

.4826 

.0431 

2.61 

.4955 

.0132 

1.62 

.4474 

.1074 

2.12 

.4830 

.0422 

2.62 

.4956 

.0129 

1.63 

.4484 

.1057 

2.13 

.4834 

.0413 

2.63 

.4957 

.0126 

1.64 

.4495 

.1040 

2.14 

.4838 

.0404 

2.64 

.4959 

.0122 

1.65 

.4505 

.1023 

2.15 

.4842 

.0396 

2.65 

.4960 

.0119 

1.66 

.4515 

.1006 

2.16 

.4846 

.0387 

2.66 

.4961 

.0116 

1.67 

.4525 

.0989 

2.17 


.0379 

2.67 

.4962 

.0113 

1.68 

.4535 

.0973 

2.18 


.0371 

2.68 

.4963 

.0110 

1.69 

.4545 

.0957 

2.19 


.0363 

2.69 

.4964 

.0107 















































510 


APPENCMX B 


Table I {continued) 


X 

a 

Area 

(O; 

Ordinate 

Y 

X 

c 

Area 

(o) 

Ordinate 

y 

X 

<T 

Area 

(a) 

Ordinate 

y 

2.70 

.4965 

.0104 

2.80 

.4974 

.0079 

2.90 

.4981 

.0060 

2.71 

.4966 

.0101 

2.81 

.4975 

.0077 

2.91 

.4982 


2.72 

.4967 

.0099 

2.82 

.4976 

.0075 

2.92 

.4982 

.0056 

2.73 

.4968 

.0096 

2.83 

.4977 

.0073 

2.93 

.4983 

.0055 

2.74 

.4969 

.0093 

2.84 

.4977 

.0071 

2.94 

.4984 

.0053 

2.76 

.4970 

.0091 

2.86 

.4978 

.0069 

2.96 

.4984 

.0051 

2.76 

>1971 

.0088 

2.86 

.4979 

.0067 

2.96 

.4985 

.0050 

2.77 

A972 

.0086 

2.87 

.4980 

.0065 

2.97 

.4985 

.0048 

2.78 

.4973 

.0084 

2.88 

.4980 

.0063 

2.98 

.4986 

.0047 

2.79 

.4974 

.0081 

2.89 

.4981 

.0061 

2.99 

.4986 

.0046 


3.00 .49865 .0044 

3.60 .49977 .0009 

4.00 .49997 .0001 

4.60 .499997 .00002 

6.00 .4999997.000002 






TABLES OF STATISTICAL FUNCTIONS 


511 


Table lA 

ORDINATE VALUES OF THE NORAAAL, BELL-SHAPED PROBABIUTY CURVE, 
EXPRESSED AS PROPORTIONS OF THE ORDINATE AT THE MEAN 

Thus: The height of the mean ordinate is taken as 1.000. An ordinate point value 
2a above or below the mean is .135 as high as it is at the mean. The mean ordinate 
for a finite distribution is; yu = H,/2.5^a. See page 432. 


X 

5 

0 

1 

2 

3 

4 

5 

■ 

■ 

8 

9 

0.0 

1.000 

.999 + 

.999 + 

.999 + 

.999 + 

.999 

.998 

.998 

.997 

.996 

0.1 

.995 

.994 

.993 

.992 

.990 

.989 

.987 

.986 

.984 

.982 

0.2 

.980 

.978 

.976 

.974 

.972 

.969 

.967 

.964 

.962 

.959 

0.8 

.956 

.953 

.950 

.947 

.944 

.941 

.937 

.934 

.930 

.927 

0.4 

.923 

.918 

.916 

.912 

.908 

.904 

.900 

.895 

.891 

.887 

0.5 

.882 

.878 

.874 

.869 

.864 

.860 

.855 

.850 

.845 

.841 

0.6 

.835 

.830 

.825 

.820 

.815 

.810 

.804 

.799 

.794 

.788 

0.7 

.783 

.777 

.772 

.766 

.760 

.755 

.749 

.743 

.738 

.732 

0.8 

.726 

.720 

.714 

.709 

.703 

.697 

.691 

.685 

.679 

.673 

0.9 

.667 

.661 

.655 

.649 

.643 

.637 

.631 

.625 

.619 

.613 

1.0 

.607 

.600 

.594 

.588 

.582 

.576 

.570 

.564 

.558 

.552 

1.1 

.546 

.540 

.534 

.528 

.522 

.516 

.510 

.504 

.498 

.493 

1.2 

.487 

.481 

.475 

.469 

.464 

.458 

.452 

.446 

.441 

.435 

1.3 

.430 

.424 

.418 

.413 

.407 

.402 

.397 

.391 

.386 

.381 

1.4 

.375 

.370 

.365 

.360 

.355 

.350 

.344 

.339 

.334 

.330 

1.5 

.325 

.320 

.315 

.310 

.306 

.301 

.296 

.292 

.287 

.283 

1.6 

.278 

.274 

.269 

.265 

.261 

.256 

.252 

.248 

.244 

.240 

1.7 

.236 

.232 

.228 

.224 

.220 

.216 

.213 

.209 

.205 

.201 

1.8 

.198 

.194 

.191 

.187 

.184 

.181 

.177 

.174 

.171 

.168 

1.9 

.164 

.161 

.158 

.155 

.152 

.149 

.146 

.144 

.141 

.138 

2.0 

.135 

.133 

.130 

.127 

.125 

.122 

.120 

.117 

.115 

.113 

2.1 

.110 

.108 

.106 

.103 

.101 

.099 

.097 

.095 

.093 

.091 

2.2 

.089 

.087 

.085 

.083 

.081 

.080 

.078 

.076 

.074 

.073 

2.3 

.071 

.069 

.068 

.066 

.065 

.063 

.062 

.060 

.059 

.058 

2.4 

.056 

.055 

.054 

.052 

.051 

.050 

.049 

.047 

.046 

.045 

2.5 

.044 

.043 

.042 

.041 

.040 

.039 

.038 

.037 

.036 

.035 

2.6 

.034 

.033 

.032 

.031 

.031 

.030 

.029 

.028 

.028 

.027 

2.7 

.026 

.025 

.025 

.024 

.023 

.023 

.022 

.022 

.021 

.020 

2.8 

.020 

.019 

.019 

.018 

.018 

.017 

.017 

.016 

.016 

.015 

2.9 

.015 

.014 

.014 

.014 

.013 

.013 

.013 

.012 

.012 

.011 

3.0 

.011 










4.0 

.0003 










5.0 

.00000 
















512 


APPENDIX B 


Table II 

PROBABIUTY VALUES FOR T OF NOR/AAL SAMPLING DISTRIBUTIONS 
OF LARGE SAMPLE THEORY 

Example: If T, the test ratio of a Test of Significance, (s — h)/as, is 2.0, the P 
(probability) value is .0228 for a result equal to or larger than (or less than, 
depending on which tail of the sampling distribution is involved) the sample value of 
the statistic (s). 


T 

P 

T 

P 

T 

P 

T 

P 

.00 

.5000 

.45 

.3264 

.90 

.1841 

1.35 

.0885 

.01 

.4960 

.46 

.3228 

.91 

.1814 

1.36 

.0869 

.02 

.4920 

.47 

.3192 

.92 

.1788 

1.37 

.0853 

.03 

.4880 

.48 

.3156 

.93 

.1762 

1.38 

.0838 

. 04 . 

.4840 

.49 

.3121 

.94 

.1736 

1.39 

.0823 

.05 

.4801 

.50 

.3085 

.95 

.1711 

1.40 

.0808 

.06 

.4761 

.51 

.3050 

.96 

.1685 

1.41 

.0793 

.07 

.4721 

.52 

.3015 

.97 

.1660 

1.42 

.0778 

.08 

.4681 

.53 

.2981 

.98 

.1635 

1.43 

.0764 

.09 

.4641 

.54 

.2946 

.99 

.1611 

1.44 

.0749 

.10 

.4602 

.55 

.2912 

1.00 

.1587 

1.45 

.0735 

.11 

.4562 

.56 

.2877 

1.01 

.1562 

1.46 

.0721 

.12 

.4522 

.57 

.2843 

1.02 

.1539 

1.47 

.0708 

.13 

.4483 

.58 

.2810 

1.03 

.1515 

1.48 

.0694 

.14 

.4443 

.59 

.2776 

1.04 

.1492 

1.49 

.0681 

.15 

.4404 

.60 

.2742 

1.05 

.1469 

1.50 

.0668 

.16 

.4364 

.61 

.2709 

1.06 

.1446 

1.51 

.0655 

.17 

.4325 

.62 

.2676 

1.07 

.1423 

1.52 

.0643 

.18 

.4286 

.63 

.2643 

1.08 

.1401 

1.53 

.0630 

.19 

.4246 

.64 

.2611 

1.09 

.1379 

1.54 

.0618 

.20 

.4207 

.65 

.2578 

1.10 

.1357 

1.55 

.0606 

.21 

.4168 

.66 

.2546 

1.11 

.1335 

1.56 

.0594 

.22 

.4129 

.67 

.2514 

1.12 

.1314 

1.57 

.0582 

.23 

.4090 

.68 

.2482 

1.13 

.1292 

1.58 

.0570 

.24 

.4052 

.69 

.2451 

1.14 

.1271 

1.59 

.0559 

.25 

.4013 

.70 

.2420 

1.15 

.1251 

1.60 

.0548 

.26 

.3974 

.71 

.2388 

1.16 

.1230 

1.61 

.0537 

.27 

.3936 

.72 

:2358 

1.17 

.1210 

1.62 

.0526 

.28 

.3897 

.73 

.2327 

1.18 

.1190 

1.63 

.0516 

.29 

.3859 

.74 

.2296 

1.19 

.1170 

1.64 

.0505 

.30 

.3821 

.75 

.2266 

1.20 

.1151 

1.65 

.0495 

.31 

.3783 

.76 

.2236 

1.21 

.1131 

1.66 

.0485 

.32 

.3745 

.77 

.2206 

1.22 

.1112 

1.67 

.0475 

.33 

.3707 

.78 

.2177 

1.23 

.1093 

1.68 

.0465 

.34 

.3669 

.79 

.2148 

1.24 

.1075 

1.69 

.0455 

.35 

.3632 

.80 

.2119 

1.25 

.1056 

1.70 

.0446 

.36 

.3594 

.81 

.2090 

1.26 

.1038 

1.71 

.0436 

.37 

.3557 

.82 

.2061 

1.27 

.1020 

1.72 

.0427 

.38 

.3520 

.83 

.2033 

1.28 

.1003 

1.73 

.0418 

.39 

.3483 

.84 

.2004 

1.29 

.0985 

1.74 

.0409 

.40 

.3446 

.85 

.1977 

1.30 

.0968 

1.75 

.0401 

.41 

.3409 

.86 

.1949 

1.31 

.0951 

1.76 

.0392 

.42 

.3372 

.87 

.1922 

1.32 

.0934 

1.77 

.0384 

.43 

.3336 

.88 

.1894 

1.33 

.0918 

1.78 

.0375 

.44 

.3300 

.89 

.1867 

1.34 

.0901 

1.79 

.0367 




TABLES OF STATISTICAL FUNCTIONS 


513 


Table II (conf/nueeO 


T 

P 

T 

P 

T 

P 

T 

P 

1.80 

.0359 

2.10 

.0179 

2.40 

.0082 

2.70 

.0035 

1.81 

.0351 

2.11 

.0174 

2.41 

.0080 

2.71 

.0034 

1.82 

.0344 

2.12 

.0170 

2.42 

.0078 

2.72 

.0033 

1.83 

.0336 

2.13 

.0166 

2.43 

.0075 

2.73 

.0032 

1.84 

.0329 

2.14 

.0162 

2.44 

.0073 

2.74 

.0031 

1.85 

.0322 

2.15 

.0158 

2.45 

.0071 

2.75 

.0030 

1.86 

.0314 

2.16 

.0154 

2.46 

.0069 

2.76 

.0029 

1.87 

.0307 

2.17 

.0150 

2.47 

.0068 

2.77 

.0028 

1.88 

.0300 

2.18 

.0146 

2.48 

.0066 

2.78 

.0027 

1.89 

.0294 

2.19 

.0143 

2.49 

.0064 

2.79 

.0026 

1.90 

.0287 

2.20 

.0139 

2.50 

.0062 

2 80 

.0026 

1.91 

.0281 

2.21 

.0136 

2.51 

.0060 

2.81 

.0025 

1.92 

.0274 

2.22 

.0132 

2.52 

.0059 

2.82 

.0024 

1.93 

.0268 

2.23 

,0129 

2.53 

.0057 

2.83 

.0023 

1.94 

.0262 

2.24 

.0125 

2.54 

.0055 

2.84 

.0023 

1.95 

.0256 

2.25 

.0122 

2.55 

.0054 

2.85 

.0022 

1.96 

.0250 

2.26 

.0119 

2.56 

.0052 

2.86 

.0021 

1.97 

.0244 

2.27 

.0116 

2.57 

.0051 

2.87 

.0020 

1.98 

.0238 

2.28 

.0113 

2.58 

.0049 

2.88 

.0020 

1.99 

.0233 

2.29 

.0110 

2.59 

.0048 

2.89 

.0019 

2.00 

.0228 

2.30 

.0107 

2.60 

.0047 

2.90 

.0019 

2.01 

.0222 

2.31 

.0104 

2.61 

.0045 

2.91 

.0018 

2.02 

.0217 

2.32 

.0102 

2.62 

.0044 

2.92 

.0018 

2.03 

.0212 

2.33 

.0099 

2.63 

.0043 

2.93 

.0017 

2.04 

.0207 

2.34 

.0096 

2.64 

.0041 

2.94 

.0016 

2.05 

.0202 

2.35 

.0094 

2.65 

.0040 

2.95 

.0016 

2.06 

.0197 

2.36 

.0091 

2.66 

.0039 

2.96 

.0015 

2.07 

.0192 

2.37 

.0089 

2.67 

.0038 

2.97 

.0015 

2.08 

.0188 

2.38 

.0087 

2.68 

.0037 

2.98 

.0014 

2.09 

.0183 

2.39 

.0084 

2.69 

.0036 

2.99 

.0014 







3.00 

.00135 



514 


APPB4DIX B 


DISTRIBUTION OF t FOR TESTS 


Table HI 

OF SIGNIFICANCE OF SAULL SAMPLES* 


IN.-1) 

.5 

.1 

Probabilityt P 

.06 .02 

.01 

.001 

1 

1.000 

6.314 

12.706 

31.821 

63.657 

636.619 

2 

.816 

2.920 

4.303 

6.965 

9.925 

31.598 

8 

.765 

2.353 

3.182 

4.541 

5.841 

12.941 

4 

.741 

2.132 

2.776 

3.747 

4.604 

8.610 

6 

.727 

2.015 

2.571 

3.365 

4.032 

6.859 

6 

,718 

1.943 

2.447 

3.143 

3.707 

5.959 

7 

.711 

1.895 

2.365 

2.998 

3499 

5405 

8 

.706 

1.860 

2.306 

2.896 

3.355 

5.041 

9 

.703 

1.833 

2.262 

2.821 

3.250 

4.781 

10 

JQO 

1.812 

2.228 

2.764 

3.169 

4^87 

11 

.697 

1,796 

2.201 

2.718 

3.1 Od 

4437 

12 

.695 

1.782 

2.179 

2.681 

3.055 

4.318 

13 

.694 

1.771 

2.160 

2.650 

3.012 

4.221 

14 

.692 

1 J ^61 

2.145 

2.624 

2.977 

4.140 

18 

.691 

1,753 

2.131 

2.602 

2.947 

4.073 

16 

.690 

1,746 

2.120 

. 2.583 

2.921 

4.015 

17 

.689 

1.740 

2.110 

2 JS 67 

2.898 

3.965 

18 

.688 

\JU 

2.101 

2.552 

2.878 

3.922 

19 

.688 

1.729 

2.093 

2.539 

2.861 

3.883 

20 

.687 

1.725 

2.086 

2.528 

2.845 

3.850 

21 

.686 


2.080 

2.518 

2.831 

3.819 

22 

.686 

1.717 

2.074 

2.508 

2.819 

3.792 

23 

.685 

1,714 

2.069 

2.500 

2.807 

3.767 

24 

.685 

1,711 

2.064 

2.492 

2.797 

3745 

26 

.684 


2.060 

2.485 

2.787 

3725 

26 

.684 

^ J 06 

2.056 

2.479 

2.779 

3.707 

27 

.684 

\ 70 Z 

2.052 

2.473 

2.771 

3.690 

28 

.683 

\ 70 \ 

2.048 

2467 

2.763 

3.674 

29 

.683 

1.699 

2.045 

2.462 

2756 

3.659 

mm 

.683 

1,697 

2.042 

2457 

2.750 

3.646 


.681 

1.684 

2.021 

2423 

2.704 

3.551 

60 

.679 

1.671 

2.000 

2.390 

2.660 

3460 

120 

.677 

1.658 

1.980 

2.358 

2.617 

3.373 

00 

.674 

1.645 

1.960 

2.326 

2.576 

3.291 


*TaUe III is abridged from Table III of Fisher: Stalittieal Tables for Biological, Agri- 
ealtaral and Medical Research, Oliver & Boyd, Ltd., Edinburgh, by permission of the Author 
and PubUshers. 








TABLES OF STATISTICAL FUNCTIONS 


515 


Table IV 

DISTRIBUTION OF CHI-SQUARE* 


d . f . 

.99 

.95 

.90 

.60 

Probabflityt P 

.30 .20 .10 

.05 

.02 

.01 

.001 

1 

.00 

.00 

.02 

.46 

1.07 

1.64 

2.71 

3.84 

5.41 

6.64 

10.83 

2 

.02 

.10 

.21 

1.39 

2.41 

3.22 

4.60 

5.99 

7.82 

9.21 

13.82 

8 

.12 

.35 

.58 

2.37 

3.66 

4.64 

6.25 

7.82 

9.84 

11.34 

16.27 

4 

.30 

,71 

1.06 

3.36 

4.88 

5.99 

7.78 

9.49 

11.67 

13.28 

18.46 

6 

.55 

1.14 

1.61 

4.35 

6.06 

7.29 

9.24 

11.07 

13.39 

15.09 

20.52 

6 

.87 

1.64 

2.20 

5.35 

7.23 

8.56 

10.64 

12.59 

15.03 

16.81 

22.46 

7 

1.24 

2.17 

2.83 

6.35 

8.38 

9.80 

12.02 

14.07 

16.62 

18.48 

24.32 

8 

1.65 

2.73 

3.49 

7.34 

9.52 

11.03 

13.36 

15.51 

18.17 

20.09 

26.12 

9 

2.09 

3.32 

4.17 

8.34 

10.66 

12.24 

14.68 

16.92 

19.68 

21.67 

27.88 

10 

2.56 

3.94 

4.86 

9.34 

11.78 

13.44 

15.99 

18.31 

21.16 

23.21 

29.59 

11 

3,05 

4.58 

5.58 

10.34 

12.90 

14.63 

17.28 

19.68 

22.62 

24.72 

31.26 

12 

3.57 

5.23 

6.30 

11.34 

14.01 

15.81 

18.55 

21.03 

24.05 

26.22 

32.91 

13 

4.11 

5.89 

7.04 

12.34 

15.12 

16.98 

19.81 

22.36 

25.47 

27.69 

34.53 

14 

4.66 

6.57 

7.79 

13.34 

16.22 

18.15 

21.06 

23.68 

26.87 

29.14 

36.12 

15 

5.23 

7.26 

8.55 

14.34 

17.32 

19.31 

22.31 

25.00 

28.26 

30.58 

37.70 

16 

5.81 

7.96 

9.31 

15.34 

18.42 

20.46 

23.54 

26.30 

29.63 

32.00 

39.25 

17 

6.41 

8.67 

10.08 

16.34 

19.51 

21.62 

24.77 

27.59 

31.00 

33.41 

40.79 

18 

7.02 

9.39 

10.86 

17.34 

20.60 

22.76 

25.99 

28.87 

32.35 

34.80 

42.31 

19 

7.63 

10.12 

11.65 

18.34 

21.69 

23.90 

27.20 

30.14 

33.69 

36.19 

43.82 

20 

8.26 

10.85 

12.44 

19.34 

22.78 

25,04 

28.41 

31.41 

35.02 

37.57 

45.32 

21 

8.90 

11.59 

13.24 

20.34 

23.86 

26.17 

29.62 

32.67 

36.34 

38.93 

46.80 

22 

9.54 

12.34 

14.04 

21.34 

24.94 

27.30 

30.81 

33.92 

37.66 

40.29 

48.27 

23 

10.20 

13.09 

14.85 

22.34 

26.02 

28.43 

32.01 

35.17 

38.97 

41.64 

49.73 

24 

10.86 

13.85 

15.66 

23.34 

27.10 

29.55 

33.20 

36.42 

40.27 

42.98 

51.18 

26 

11.52 

14.61 

16.47 

24.34 

28.17 

30.68 

34.38 

37.65 

41.57 

44.31 

52.62 

26 

12.20 

15.38 

17.29 

25.34 

29.25 

31.80 

35.56 

38.88 

42.86 

45.64 

54.05 

27 

12.88 

16.15 

18.11 

26.34 

30.32 

32.91 

36.74 

40.11 

44.14 

46.96 

55.48 

28 

13.56 

16.93 

18.94 

27.34 

31.39 

34.03 

37.92 

41.34 

45.42 

48.28 

56.89 

29 

14.26 

17.72 

19.77 

28.34 

32.46 

35.14 

39.09 

42.56 

46.69 

49.59 

58.30 

30 

14.95 

18.49 

20.60 

29.34 

33 . 53 . 

36.25 

40.26 

43.77 

47.96 

50.89 

59.70 


* Table IV is abridged from Table IV of Fisher: Statistical Tables for Biological, 
Agricultural and Medical Research, Oliver & Boyd, Ltd., Edinburgh, by permission of the 
author and Publishers. 







Tpble V 

VALUES OF FUNCTIONS OF r* 


r 



Vr-r» 


I — r® 

Vi -r® 

100(1 - 

r 





^‘(M) 



%Eff. 


I.CX> 

I.CXXX) 

I.0000 

0.0000 

0.0000 

0.0000 

0.0000 

100.00 

1.00 

99 

.9950 

.9801 

•0995 

.1000 

.0199 

.1411 

85.89 

•99 

.9# 

.9899 

.9604 

.1400 

.1414 

.0396 

.1990 

80.10 

.98 


.9849 

.9409 

.1706 

.1732 

.0591 

• 2431 

75-69 

•97 

.96 

.9798 

.9216 

.1960 

.2000 

.0784 

.2800 

72.00 

.96 

.95 

•9747 


.2179 

.2236 

•0975 

.3122 


•95 

•94 

.93 

.9695 

.9644 

.8836 

.8649 

•2375 

.2551 


.1164 

•I35J 

1^76 

65.88 

63.24 

•94 

•93 

.92 

•959* 

.8464 

.2713 

.2828 

• 1536 

•391? 

60.81 

•92 

•91 

•9539 

.8281 

.2862 

.3000 

.1719 

.4146 

58-54 

•91 

.90 

.9487 

.8100 

.3000 

.31^2 

.1900 

•4359 

56.41 

.90 

•89 

.9434 

•7921 

.3129 

•3317 

.2079 

.4560 

5440 

.89 

.88 

•9381 


•3*50 

.3464 

.2256 

•4750 

52.50 

.88 


•9327 

.7569 

•3363 

.3&>6 

•2431 

•4931 

50.69 


.86 

•9*74 

•7396 

•3470 

•3742 

.2604 

•5103 

48.97 

.86 

.8s 

.9220 

.7225 

•mi 

•3873 

.2775 

.5268 

47.32 

.85 

.84 

•9165 

•7056 

.3666 

.4000 

.2944 

■54*6 

45-74 


.83 

.9110 

.6889 

•3756 

•4123 

.3“i 

•5578 

44.22 

.83 

.82 

•9055 

•6744 

•384* 

•4*43 

•3*76 

•5724 

4*. 76 

.82 

.81 

.9000 

.6561 

.3923 

•4359 

•3439 

.5864 

41-36 

.81 

.80 

• 8944 

.6400 

.4000 

.4472 

.3^ 

.6000 

40.00 

.80 

•79 

.8888 

.6241 

•4073 

•4583 

•375? 

.6131 

38.69 

.79 

.78 

.8832 

.6084 

.4142 

.4690 

.3916 

.6258 

37.42 

.78 


.8775 

.5929 

.4208 

.4796 

.4071 

.6380 

36.20 

•77 

•76 

.8718 

.5776 

•4*71 

•4899 

•4**4 

■6499 

35-01 

•76 

.75 

.8660 

.5625 

•4330 

.5000 

•4375 

•6614 

33-86 

•75 

•74 

.8602 

.5476 

.4386 

.5099 

•45*4 

.6726 

32.74 

•74 

•73 

.8544 

.5329 

.4440 

.5196 

.4671 

.6834 

31.66 

•73 

.72 

.8485 

.5184 

•4490 

.5*9* 

.4816 

.6940 

30.60 

.72 

.71 

.8426 

.5041 

•4538 

.5385 

•4959 

.7042 

29.58 

•71 

.70 

.8367 

.4900 

•4583 

•5477 

.5100 

.7141 

28.59 

.70 

.69 

.8307 

.4761 

.46*5 

.5568 

•523? 

•7238 

27.62 

.69 

.68 

.8246 

.4624 

.4665 

•5657 

.5376 

•7332 

26.68 

.68 

•67 

.8185 

.4489 

•470* 

•5745 

.5511 

.7424 

25.76 

•67 

.66 

.8124 

•4356 

•4737 

•583* 

.5644 

•7513 

24.87 

.60 

•65 

,8062 

.422c 

.4770 

.5916 

.5775 

•7599 

24.01 

.65 

.64 

.8000 

.4096 

.4800 

.6000 

•5904 

.7684 

23.16 

.64 

•63 

.7937 

•3969 

.4828 

.fo 83 

.6031 

.7766 

22.34 

•63 

.62 

.7874 

.3844 

.4854 

.6164 

.6156 

.7846 

21.54 

.62 

.61 

.7810 

•3721 

•4877 

.6*45 

.6*79 

.7924 

20.76 

.61 

.60 

•7746 

.3600 

•4899 

.63*5 

.6400 

.8000 

20.00 

.60 

•59 

.7681 

-3481 

.4918 

.6403 

.6519 

•8o7i 

19.26 

•59 

.58 

.7616 

•3364 

•4930 

.6481 

.6636 

.8146 

18.54 

•58 


.7550 

•3249 

•4951 

.6557 

.67s* 

.8216 

17.84 

•^7 

.56 

-7483 

.3136 

•4964 

•6633 

•6864 

.8285 

17.15 

•56 

.55 

.7416 

.3025 

•4975 

.6708 

.6975 

• 835* 

16.48 

•55 

.54 

•53 

•7348 

.7280 

.2916 

.2809 

.4984 

.4991 

.678a 

.6856 

.7084 

.7191 

.8417 

.8480 

15-83 

15.20 

•54 

•53 

.52 

.7211 

•2704 

.4996 

.6928 

.7296 

.854* 

14.58 

•52 

.51 

.7141 

.2601 

•4999 

.7000 

•7399 

.8602 

13-98 

•51 

•SO 

.7071 

.2500 

.5000 

.7071 

.7500 

.8660 

13-40 

.50 


* FVom W. y. Bingliam, ApiUades and Aptitude Testing, Harper & Brothers, New York, 
1937, Table XVIII. 


516 










TABLES OF STATISTICAL FUNCTIONS 
Table V (coirfimwd) 


517 


r 

Vr 

r* 

Vr-r* 


I -r* 


100(1 — 

r 




OB *1 

^(M) 


k 

%Eff. 


.50 

.7071 

.2500 

. JOOO 

.7071 

.7500 

.8660 

13-40 

•50 

•49 

.7000 

.2401 

•4999 

.7141 

•759? 

.•8717 

12 . 8 j 

•49 

.48 

.6928 

.2304 

.4996 

.7211 

.76^ 

•8773 

12.27 

•48 

•47 

.6856 

.2209 

.4991 

.7280 

•7791 

.8827 

11-73 


.46 

.6782 

.2116 

.4984 

•7348 

.7884 

•8879 

II.21 

.46 

•45 

.6708 

.2025 

•4975 

.7416 

•7975 

.8930 

10.70 

.45 

•44 

•6633 

.1936 

• 4964 

•7483 

.8064 

.8980 

10.20 

•44 

•43 

•6557 

.1849 

•4951 

•7550 

.8151 

.9028 

9.72 

•43 

•42 

.6481 

.1764 

•4936 

.7616 

.8236 

•9075 

9.25 

.42 

.41 

•6403 

.1681 

.4918 

.7681 

•8319 

.9121 

8.79 

.41 

.40 

.632J 

.1600 

•4899 

•7746 

.8400 

.9165 

8.35 

.40 

•39 

.62^5 

.1521 

•4877 

.7810 

.8479 

.9208 

7.92 

•39 

•38 

.6164 


•4854 

•7874 


.9250 

7.50 

•38 

•37 

.6083 

.1369 

.4828 

■7937 

.8631 

.9290 

7.10 

.37 

.36 

.6000 

.1296 

.4800 

.8000 

•8704 

•9330 

6.70 

•36 

•35 

.5916 

.1225 

•4770 

.8062 

•SP5 

•9367 

6.33 

•35 

.34 

•5831 

.1156 

•4737 

.8124 

. 8844 

.9404 

5.96 

•34 

•33 

•5745 

.1089 

.470a 

.8185 

.8911 

.9440 

5.60 

.33 


•5657 

.1024 

•4665 

.8246 

.8976 

•9474 

5-25 

•32 

•31 

•5568 

.0961 

•4625 

.8307 

•9039 

•950/ 

4-93 

•31 

•30 

•5477 

.0900 

•4583 

.8367 

.9100 

•9539 

4.61 

.30 

.29 

.5385 

.0841 

.4538 

.8426 

•9159 

•9570 

4-30 


.28 

.5292 

.0784 

.4490 

.8485 

.9216 

.9600 

4.00 

.28 

•^7 

.5196 

.0729 

.4440 

.8544 

.9271 

.9629 

371 


.26 

•5999 

.0676 

•4386 

.8602 

•9324 

.9656 

3-44 

.26 

•25 

.5000 

.0625 

•4330 

.8660 

•9375 

.9682 

3.18 

•25 

.24 

•4899 

.0576 

.4271 

.8718 

•9424 

.9708 

2.92 

•24 

.23 

.4796 

.0529 

.4208 

•8775 

•9471 

•9732 

2.68 

.23 

.22 

.4690 

.0484 

.4142 

.8832 

.9516 

•9755 

2.45 

.22 

.21 

.4583 

.0441 

•4073 

.8888 

•9559 

•9777 

2.23 

.21 

.20 

.4472 

.0400 

.4000 

• 8944 

.9600 

.9798 

2.02 

.20 

• 19 

•4359 

.0361 

.3923 

.9000 

.9639 

.9818 

1.82 


.18 

•4243 

.0324 

•3842 

•9055 

.9676 

•9837 

1.63 

.18 

• 17 

•4123 

.0289 

•3756 

.9110 

■9711 

.9854 

1.46 


.16 

.4000 

.0256 

.3666 

.9165 

•9744 

.9871 

1.29 

.10 

.15 

•3873 

.0225 

•3571 

.9210 

•9775 

•9887 

1.13 

• 15 

• 14 

•3742 

.0190 

•3470 

.9274 

•9804 

.9902 

•98 

• 14 

• 13 

.3606 

.0109 

•3363 

•9327 

.9831 

•9915 

.85 

• 13 

.12 

•3464 

.0144 

.325:0 

.9381 

.9856 

.9928 

.72 

.12 

.11 

•3317 

.0121 

.3129 

•9434 

.9879 

•9939 

.61 

.11 

.lO 

.3162 

.0100 

•^ 25 ° 

.9487 

.9900 

.9950 

.50 

.10 

.09 

.3000 

.0081 

.2862 

•9539 

.9919 

•9959 

.41 

.09 

.08 

.2828 

.0064 

.2713 

•9592 

•9930 

.9968 

•32 

.08 

.07 

.2646 

.0049 

•2551 

•9644 

•9951 

•9975 



.06 

•2449 

.0036 

.2375 

•9695 

•9964 

•9982 

.18 

.06 

•05 

.04 

.2236 

.2000 

.0025 

.0010 

.2179 

.i960 

•9747 

•9798 

•9975 
• 9984 

•9987 

•9992 


•05 

.04 

•03 

.1732 

.0009 

.1706 

•9849 

• 9991 

•9995 

•05 

•03 

.02 

.1414 

.0004 

.1400 

•9899 

• 9996 

•9998 

.02 

.02 

.01 

.1000 

.0001 

.0995 

•9950 

• 9999 

• 9999 

.01 

.01 

.00 

.0000 

.0000 

.0000 

I.0000 

I.0000 

I.0000 

.00 

.00 






518 


APPENDIX B 


Table VI 

VALUES OF FISHER’S z FUNCTION FOR GIVEN VALUES OF PEARSON'S r* 


r 

z 

r 

z 

r 

z 

p 

z 

.00 

.00 

.25 

.26 

.50 

.55 

.75 

.97 

.01 

.01 

.26 

.27 

.51 

.56 

.76 

1.00 

.02 

.02 

.27 

.28 

.52 

.58 

.77 

1.02 

.03 

.03 

.28 

.29 

.53 

.59 

.78 

1.05 

.04 

.04 

.29 

.30 

.54 

.60 

.79 

1.07 

.05 

.05 

.30 

.31 

.55 

.62 

.80 

1.10 

.06 

.06 

.31 

.32 

.56 

.63 

.81 

1.13 

.07 

.07 

.32 

.33 

.57 

.65 

.82 

1.16 

.08 

.08 

.33 

.34 

.58 

.66 

.83 

1.19 

.09 

.09 

.34 

.35 

.59 

.68 

.84 

1.22 

.10 

.10 

.35 

.37 

.60 

.69 

.85 

1.26 

.11 

.11 

.36 

.38 

.61 

.71 

.86 

1.29 

.12 

.12 

.37 

.39 

.62 

.73 

.87 

1.33 

.13 

.13 

.38 

.40 

.63 

.74 

.88 

1.38 

.14 

.14 

,39 

.41 

.64 

.76 

.89 

1.42 

.15 

.15 

.40 

.42 

.65 

.78 

.90 

1.47 

.16 

.16 

.41 

.44 

.66 

.79 

.91 

1.53 

.17 

.17 

.42 

.45 

.67 

.81 

.92 

1.59 

.18 

.18 

.43 

.46 

.68 

.83 

.93 

1.66 

.19 

.19 

.44 

.47 

.69 

.85 

.94 

1.74 

.20 

.20 

.45 

.48 

.70 

.87 

.95 

1.83 

.21 

.21 

.46 

.50 

.71 

.89 

.96 

1.95 

.22 

.22 

.47 

.51 

.72 

.91 

.97 

2.09 

.23 

.23 

.48 

.52 

.73 

.93 

.98 

2.30 

.24 

.24 

.49 

.54 

.74 

.95 

.99 

2.65 


* Table VI is adapted from Table VII of Fisher; Statistical Tables for Biological, 
Agricaltaral and Medical Research, Oliver & Boyd, Ltd., Edinburgh, by permission of the 
Author and Publishers. 








Table VII 

VALUES OF PROPORTIONS p AND q* 

(Values Employed in the Determination of Biserial and Point-Biserial Correlations) 


( 1 ) 

p 

( 2 ) 

9 

( 3 ) 

pq 

H 

( 5 ) 

pq 

y 

( 6 ) 

Vpq 

( 7 ) 

.01 

.99 

.0099 

.3745 

.3700 

.0994 

.1005 

.02 

.98 

.0196 

.4132 

.3935 

.1380 

.1428 

.03 

.97 

.0291 

.4412 

.4264 

.1703 

.1758 

.04 

.96 

.0384 

.4640 

.4452 

.1959 

.2042 

.06 

.96 

.0475 

.4850 

.4605 

.2179 

.2293 

.06 

.94 

.0564 

.5038 

.4736 

.2375 

.2526 

.07 

.93 

.0651 

.5212 

AB 44 

.2551 

.2744 

.08 

.92 

.0736 

.5380 

.4950 

.2713 

.2950 

.09 

.91 

.0819 

.5542 

.5044 

.2862 

.3145 

.10 

.90 

.0900 

.5698 

.5129 

.3000 

.3333 

.11 

.89 

.0979 

.5851 

.5207 

.3129 

.3416 

.12 

.88 

.1056 

.6000 

.5278 

.3249 

.3693 

.13 

.87 

.1131 

.6147 

.5347 

.3363 

.3865 

.14 

.86 

.1204 

.6289 

.5410 

.3470 

.4035 

.16 

.86 

.1275 

.6432 

.5469 

.3571 

.4201 

.16 

.84 

.1344 

.6576 

.5523 

.3666 

.4365 

.17 

.83 

.1411 

.6717 

.5574 

.3756 

.4525 

.18 

.82 

.1476 

.6860 

.5627 

.3842 

.4685 

.19 

.81 

.1539 

J^OOl 

.5670 

.3923 

.4844 

.20 

.80 

.1600 

.7143 

.5714 

.4000 

.5000 

.21 

.79 

.1659 

.7287 

.5758 

.4073 

.5156 

.22 

.78 

.1716 

.7430 

.5793 

.4142 

.5311 

.23 

.77 

.1771 

.7576 

.5832 

.4208 

.5465 

.24 

.76 

.1824 

.7720 

.5868 

.4271 

.5620 

.26 

.76 

.1875 

^867 

.5900 

.4330 

.5773 

.26 

.74 

.1924 

.8015 

.5929 

.4386 

.5928 

.27 

.73 

.1971 

.8167 

.5960 

.4439 

.6082 

.28 

.72 

.2016 

.8318 

.5989 

.4490 

.6236 

.29 

.71 

.2059 

.8472 

.6016 

.4538 

.6391 

.30 

.70 

.2100 

.8628 

.6037 

.4582 

.6547 

.31 

.69 

.2139 

.8787 

.6062 

.4625 

.6703 

.32 

.68 

.2176 

.8949 

.6086 

.4665 

.6860 

.33 

.67 

.2211 

.9114 

.6107 

.4702 

.7018 

.34 

.66 

.2244 

.9279 

.6125 

.4737 

.7178 

.36 

.66 

.2275 

.9449 

.6143 

A 770 

.7338 

.36 

.64 

.2304 

.9623 

.6159 

.4800 

7500 

.37 

.63 

.2331 

.9799 

.6173 

.4828 

.7664 

.38 

.62 

.2356 

.9979 

.6187 

.4854 

J ^829 

.39 

.61 

.2379 

1.0164 

.6200 

.4877 

.7996 

.40 1 

.60 

.2400 

1.0355 

.6214 

>1899 

.8165 

.41 

.69 

.2419 

1.0548 

.6222 

.4918 

.8336 

.42 

.68 

.2436 

1.0744 

.6230 

.4935 

.8509 

.43 

.67 

.2451 

1.0947 

.6241 

.4951 

.8686 

.44 

.66 

.2464 

1.1156 

.6247 

.4964 

.8864 

.46 

.66 

.2475 

1.1369 

.6254 

.4975 

.9045 

.46 

.64 

.2484 

1.1590 

.6258 

.4984 

.9230 

.47 

.63 

.2491 

1.1815 

.6262 

.4991 

.9417 

.48 

.62 

.2496 

1.2048 

.6265 

.4996 

.9508 

.49 

.61 

.2499 

1.2287 

.6266 

.4999 

.9802 

.60 

.60 

.2500 

1.2534 

.6266 

.5000 

1.0000 


*Thi8 table was developed by E. K. Taylor of the Adjutant General’s Office, War 
Department, and is reproduced by permission. 

519 

















APPENDIX C 


Tables of Squares, Square Roots, 
Reciprocals, and Random Numbers 


Table 

I. Squares, Square Roots, and Reciprocals of Integers from 1 to 1000 
II. A Table of Random Numbers 





522 


APPENDIX C 


Table I 


SQUARES, 

SQUARE ROOTS 

AND RECIPROCALS 

1 TO 1000 

OF INTEGERS 

FROM 

n 


ViT 

1 

1 




n 

Vn 

1 

1 

1.0000 

1.000000 

1.0000 

2 

4 

1.4142 

.500000 

.7071 

3 

9 

1.7321 

.333333 

.5774 

4 

16 

2.0000 

.250000 

.5000 

5 

25 

2.2361 

.200000 

.4472 

6 

36 

2.4495 

.166667 

.4082 

7 

49 

2.6458 

.142857 

.3780 

8 

64 

2.8284 

.125000 

.3536 

9 

81 

3.0000 

.mill 

.3333 

10 

100 

3.1623 

.100000 

.3162 

11 

121 

3.3166 

.090909 

.3015 

12 

144 

3.4641 

.083333 

.2887 

13 

169 

3.6056 

.076923 

.2774 

14 

196 

3.7417 

.071429 

.2673 

15 

225 

3.8730 

.066667 

.2582 

16 

256 

4.0000 

.062500 

.2500 

17 

289 

4.1231 

.058824 

.2425 

18 

324 

4.2426 

.055556 

.2357 

19 

361 

4.3589 

.052632 

.2294 

20 

400 

4.4721 

.050000 

.2236 

21 

441 

4.5826 

.047619 

.2182 

22 

484 

4.6904 

.045455 

.2132 

28 

529 

4.7958 

.043478 

.2085 

24 

576 

4.8990 

.041667 

.2041 

26 

625 

5.0000 

.040000 

.2000 

26 

676 

5,0990 

.038462 

.1961 

27 

729 

5.1962 

.037037 

.1925 

28 

784 

5.2915 

.035714 

.1890 

29 

841 

5.3852 

.034483 

.1857 

30 

900 

5.4772 

.033333 

.1826 

31 

961 

5.5678 

.032258 

.1796 

32 

1024 

5.6569 

.031250 

.1768 

33 

1089 

5.7446 

.030303 

.1741 

84 

1156 

5.8310 

.029412 

.1715 

35 

1225 

5.9161 

.028571 

.1690 

36 

1296 

6.0000 

.027778 

.1667 

37 

1369 

6.0828 

.027027 

.1644 

38 

1444 

6.1644 

.026316 

.1622 

39 

1521 

6.2450 

.025641 

.1601 

40 

1600 

6.3246 

.025000 

.1581 

41 

1681 

6.4031 

.024390 

.1562 

42 

1764 

6.4807 

.023810 

.1543 

43 

1849 

6.5574 

.023256 

.1525 

44 

1936 

6.6332 

.022727 

.1508 

45 

2025 

6^^082 

.022222 

.1491 

46 

2116 

6.7823 

.021739 

.1474 

47 

2209 

6.8557 

.021277 

.1459 

48 

2304 

6.9282 

.020833 

.1443 

49 

2401 

7.0000 

.020408 

.1429 

50 

2500 

7.0711 

.020000 

.1414 





SQUARES, SQUARE ROOTS, AND RECIPROCALS 
Table I (confinued) 


523 


n 


ViT 

1 

n 

1 

Vn 

51 

2601 

7.1414 

.019608 

.1400 

62 

2704 

7.2111 

.019231 

.1387 

53 

2809 

7.2801 

.018868 

.1374 

54 

2916 

7.3485 

.018519 

.1361 

55 

3025 

7.4162 

.018182 

.1348 

56 

3136 

7.4833 

.017857 

.1336 

57 

3249 

7.5498 

.017544 

.1325 

58 

3364 

7.6158 

.017241 

.1313 

59 

3481 

7.6811 

.016949 

.1302 

60 

3600 

7.7460 

.016667 

.1291 

61 

3721 

7.8102 

.016393 

.1280 

62 

3844 

7.8740 

.016129 

.1270 

63 

3969 

7.9373 

.015873 

.1260 

64 

4094 

8.0000 

.015625 

.1250 

65 

4225 

8.0623 

.015385 

.1240 

66 

4356 

8.1240 

.015152 

.1231 

67 

4489 

8.1854 

.014925 

.1222 

68 

4624 

8.2462 

.014706 

.1213 

69 

4761 

8.3066 

.014493 

.1204 

70 

4900 

8.3666 

.014286 

.1195 

71 

5041 

8.4261 

.014085 

.1187 

72 

5184 

8.4853 

.013889 

.1179 

73 

5329 

8.5440 

.013699 

.1170 

74 

5476 

8.6023 

.013514 

.1162 

76 

5625 

8.6603 

.013333 

.1155 

76 

5776 

8.7178 

.013158 

.1147 

77 

5929 

8.7750 

.012987 

.1140 

78 

6084 

8.8318 

.012821 

.1132 

79 

6241 

8.8882 

.012658 

.1125 

80 

6400 

8.9443 

.012500 

.1118 

81 

6561 

9.0000 

.012346 

. 1 T 11 

82 

6724 

9.0554 

.012195 

.1104 

83 

6889 

9.1104 

.012048 

.1098 

84 

7056 

9.1652 

.011905 

.1091 

85 

7225 

9,2195 

.011765 

.1085 

86 

7396 

9.2736 

.011628 

.1078 

87 

7569 

9.3274 

.011494 

.1072 

88 

7744 

9.3808 

.011364 

.1066 

89 

7921 

9.4340 

.011236 

.1060 

90 

8100 

9.4868 

.011111 

. 1 Q 54 

91 

8281 

9.5394 

.010989 

.1048 

92 

8464 

9.5917 

.010870 

.1043 

93 

8649 

9.6437 

.010753 

.1037 

94 

8836 

9.6954 

.010638 

.1031 

95 

9025 

9.7468 

.010526 

.1026 

96 

9216 

9.7980 

.010417 

.1021 

97 

9409 

9.8489 

.010309 

.1015 

98 

9604 

9.8995 

.010204 

.1010 

99 

9801 

9.9499 

.010101 

.1005 

100 

10000 

10.0000 

.010000 

.1000 


524 


APPENDIX C 
Table I (confintied) 


n 

n* 


1 

n 

1 

Vi 

101 

10201 

10.0499 

.009901 

.0995 

102 

10404 

10.0995 

.009804 

.0990 

103 

10609 

10.1489 

.009709 

.0985 

104 

10816 

10.1980 

.009615 

.0981 

106 

11025 

10.2470 

.009524 

.0976 

106 

11236 

10.2956 

.009434 

.0971 

107 

11449 

10.3441 

.009346 

.0967 

108 

11664 

10.3923 

.009259 

.0962 

109 

11881 

10.4403 

.009174 

.0958 

110 

12100 

10.4881 

.009091 

.0953 

111 

12321 

10.5357 

.009009 

.0949 

112 

12544 

10.5830 

.008929 

.0945 

118 

12769 

10.6301 

.008850 

.0941 

114 

12996 

10.6771 

.008772 

.0937 

116 

13225 

10.7238 

.008696 

.0933 

116 

13456 

10.7703 

.008621 

.0928 

117 

13689 

10.8167 

.008547 

.0925 

118 

13924 

10.8628 

.008475 

.0921 

119 

14161 

10.9087 

.008403 

.0917 

120 

14400 

10.9545 

.008333 

.0913 

121 

14641 

11.0000 

.008264 

.0909 

122 

14884 

11.0454 

.008197 

.0905 

123 

15129 

11.0905 

.008130 

.0902 

124 

15376 

11.1355 

.008065 

.0898 

126 

15625 

11.1803 

.008000 

.0894 

126 

15876 

11.2250 

.007937 

.0891 

127 

16129 

11.2694 

.007874 

.0887 

128 

16384 

11.3137 

.007813 

.0884 

129 

16641 

11.3578 

.007752 

.0880 

180 

16900 

11.4018 

.007692 

.0877 

181 

17161 

11.4455 

.007634 

.0874 

182 

17424 

11.4891 

.007576 

.0870 

188 

17689 

11.5326 

.007519 

.0867 

184 

17956 

11.5758 

.007463 

.0864 

186 

18225 

11.6190 

.007407 

.0861 

186 

18496 

11.6619 

.007353 

.0857 

187 

18769 

11.7047 

.007299 

.0854 

188 

19044 

11.7473 

.007246 

.0851 

189 

19321 

11.7898 

.007194 

.0848 

140 

19600 

11.8322 

.007143 

.0845 

141 

19881 

11.8743 

.007092 

.0842 

142 

20164 

11.9164 

.007042 

.0839 

148 

20449 

11.9583 

.006993 

.0836 

144 

20736 

12.0000 

.006944 

.0833 

146 

21025 

12.0416 

.006897 

.0830 

146 

21316 

12.0830 

.006849 

.0828 

147 

21609 

12.1244 

.006803 

.0825 

148 

21904 

12.1655 

.006757 

.0822 

149 

22201 

12.2066 

.006711 

.0819 

160 

22500 

12.2474 

.006667 

.0816 





SQUARES, SQUARE ROOTS, AND RECIPROCALS 
Table I (eontinued] 


525 


n 

n * 

1 e 
> 

1 

n 

1 

151 

22801 

12.2882 

.006623 

.0814 

16 S 

23104 

12.3288 

.006579 

.0811 

163 

23409 

12.3693 

.006536 

.0808 

154 

23716 

12.4097 

.006494 

.0806 

155 

24025 

12.4499 

.006452 

.0803 

166 

24336 

12.4900 

.006410 

.0801 

157 

24649 

12.5300 

.006369 

.0798 

168 

24964 

12.5698 

.006329 

.0796 

159 

25281 

12.6095 

.006289 

.0793 

160 

25600 

12.6491 

.006250 

.0791 

161 

25921 

12.6886 

.006211 

.0788 

162 

26244 

12.7279 

.006173 

.0786 

163 

26569 

12.7671 

.006135 

.0783 

164 

26896 

12.8062 

.006098 

.0781 

165 

27225 

12.8452 

.006061 

.0778 

166 

27556 

12.8841 

.006024 

.0776 

167 

27889 

12.9228 

.005988 

.0774 

168 

28224 

12.9615 

.005952 

.0772 

169 

28561 

13.0000 

.005917 

.0769 

170 

28900 

13.0384 

.005882 

.0767 

171 

29241 

13.0767 

.005848 

.0765 

172 

29584 

13.1149 

.005814 

.0762 

173 

29929 

13.1529 

.005780 

.0760 

174 

30276 

13.1909 

.005747 

.0758 

175 

30625 

13.2288 

.005714 

.0756 

176 

30976 

13.2665 

.005682 

.0754 

177 

31329 

13.3041 

.005650 

.0752 

178 

31684 

13.3417 

.005618 

.0750 

179 

32041 

13.3791 

.005587 

.0747 

180 

32400 

13.4164 

.005556 

.0745 

181 

32761 

13.4536 

.005525 

.0743 

182 

33124 

13.4907 

.005495 

.0741 

183 

33489 

13.5277 

.005464 

.0739 

184 

33856 

13.5647 

.005435 

.0737 

185 

34225 

13.6015 

.005405 

.0735 

186 

34596 

13.6382 

.005376 

.0733 

187 

34969 

13.6748 

.005348 

.0731 

188 

35344 

13 J ^113 

.005319 

.0729 

189 

35721 

13.7477 

.005291 

.0727 

190 

36100 

13.7840 

.005263 

.0725 

191 

36481 

13.8203 

.005236 

.0724 

192 

36864 

13.8564 

.005208 

.0722 

193 

37249 

13.8924 

.005181 

.0720 

194 

37636 

13.9284 

.005155 

.0718 

195 

38025 

13.9642 

.005128 

.0716 

196 

38416 

14.0000 

.005102 

.0714 

197 

38809 

14.0357 

.005076 

.0712 

198 

39204 

14.0712 

.005051 

.0711 

199 

39601 

14.1067 

.005025 

.0709 

200 

40000 

14.1421 

.005000 

.0707 





526 


APPENDIX C 
Table I (confimied) 


n 

fi » 

ViT 

1 

n 

1 

Vi, 

201 

40401 

14.1774 

.004975 

.0705 

202 

40804 

14.2127 

.004950 

.0704 

203 

41209 

14.2478 

.004926 

.0702 

204 

41616 

14.2829 

.004902 

.0700 

20 ft 

42025 

14.3178 

.004878 

.0698 

206 

42436 

14.3527 

.004854 

.0697 

207 

42849 

14.3875 

.004831 

.0695 

208 

43264 

14.4222 

.004808 

.0693 

209 

43681 

14.4568 

.004785 

.0692 

210 

44100 

14.4914 

.004762 

.0690 

211 

44521 

14.5258 

.004739 

.0688 

212 

44944 

14.5602 

.004717 

.0687 

218 

45369 

14.5945 

.004695 

.0685 

214 

45796 

14.6287 

.004673 

.0684 

21 ft 

46225 

14.6629 

.004651 

.0682 

216 

46656 

14.6969 

.004630 

.0680 

217 

47089 

14.7309 

.004608 

.0679 

218 

47524 

14.7648 

.004587 

.0677 

219 

47961 

14.7986 

.004566 

.0676 

220 

48400 

14.8324 

.004545 

.0674 

221 

48841 

14.8661 

.004525 

.0673 

222 

49284 

14.8997 

.004505 

.0671 

228 

49729 

14.9332 

.004464 

.0670 

224 

50176 

14.9666 

.004464 

.0668 

22 ft 

50625 

15.0000 

.004444 

.0667 

226 

51076 

15.0333 

.004425 

.0665 

227 

51529 

15.0665 

.004405 

.0664 

228 

51984 

15.0997 

.004386 

.0662 

229 

52441 

15.1327 

.004367 

.0661 

280 

52900 

15.1658 

.004348 

.0659 

281 

53361 

15.1987 

.004329 

.0658 

232 

53824 

15.2315 

.004310 

.0657 

288 

54289 

15.2643 

.004292 

.0655 

284 

54756 

15.2971 

.004274 

.0654 

28 ft 

55225 

15.3297 

.004255 

.0652 

286 

55696 

15.3623 

.004237 

.0651 

287 

56169 

15.3948 

.004219 

.0650 

288 

56644 

15.4272 

.004202 

.0648 

289 

57121 

15.4596 

.004184 

.0647 

240 

57600 

15.4919 

.004167 

.0645 

241 

58081 

15.5242 

.004149 

.0644 

242 

58564 

15.5563 

.004132 

.0643 

248 

59049 

15.5885 

.004115 

.0642 

244 

59536 

15.6205 

.004098 

.0640 

84 ft 

60025 

15.6525 

.004082 

.0639 

246 

60516 

15.6844 

.004065 

.0638 

247 

61009 

15.7162 

.004049 

.0636 

248 

61504 

157^480 

.004032 

.0635 

249 

62001 

15.7797 

.004016 

.0634 

2 ft 0 

62500 

15.8114 

.004000 

.0632 





SQUARES, SQUARE ROOTS, AND RECIPROCALS 
Table I (confinued) 


527 


n 

n * 

ViT 

1 

n 

1 

v ;; 

251 

63001 

15.8430 

.003984 

.0631 

262 

63504 

15.8745 

.003968 

.0630 

253 

64009 

15.9060 

.003953 

.0629 

254 

64516 

15.9374 

.003937 

.0627 

255 

65025 

15.9687 

.003922 

.0626 

256 

65536 

16.0000 

.003906 

.0625 

257 

66049 

16.0312 

.003891 

.0624 

258 

66564 

16.0624 

.003876 

.0623 

259 

' 67081 

16.0935 

.003861 

.0621 

260 

67600 

16.1245 

.003846 

.0620 

261 

68121 

16.1555 

.003831 

.0619 

262 

68644 

16.1864 

.003817 

.0618 

263 

69169 

16.2173 

.003802 

.0617 

264 

69696 

16.2481 

.003788 

.0615 

265 

70225 

16.2788 

.003774 

.0614 

266 

70756 

16.3095 

.003759 

.0613 

267 

71289 

16.3401 

.003745 

.0612 

268 

71824 

16.3707 

.003731 

.0611 

269 

72361 

16.4012 

.003717 

.0610 

270 

72900 

16.4317 

.003704 

.0609 

271 

73441 

16.4621 

.003690 

.0607 

272 

73984 

16.4924 

.003676 

.0606 

273 

74529 

16.5227 

.003663 

.0605 

274 

75076 

16.5529 

.003650 

.0604 

275 

75625 

16.5831 

.003636 

.0603 

276 

76176 

16.6132 

.003623 

.0602 

277 

76729 

16.6433 

.003610 

.0601 

278 

77284 

16.6733 

.003597 

.0600 

279 

77841 

16.7033 

.003584 

.0599 

280 

78400 

16.7332 

.003571 

.0598 

281 

78961 

16^631 

.003559 

.0597 

282 

79524 

16.7929 

.003546 

.0595 

283 

80089 

16.8226 

.003534 

.0594 

284 

80656 

16.8523 

.003521 

.0593 

285 

81225 

16.8819 

.003509 

.0592 

286 

81796 

16.9115 

.003497 

.0591 

287 

82369 

16.9411 

.003484 

.0590 

288 

82944 

16.9706 

.003472 

.0589 

289 

83521 

17.0000 

.003460 

.0588 

290 

84100 

17.0294 

.003448 

.0587 

291 

84681 

17.0587 

.003436 

.0586 

292 

85264 

17.0880 

.003425 

.0585 

293 

85849 

17.1172 

.003413 

.0584 

294 

86436 

17.1464 

.003401 

.0583 

295 

87025 

17.1756 

.003390 

.0582 

296 

87616 

17.2047 

.003378 

.0581 

297 

88209 

17.2337 

.003367 

.0580 

298 

88804 

17.2627 

.003356 

.0579 

299 

89401 

17.2916 

.003344 

.0578 

300 

90000 

17.3205 

.003333 

.0577 



528 


APPENDIX C 
Table I (conffnuad) 


n 

# 1 * 


1 

n 

1 

301 

90601 

17.3494 

.003322 

.0576 

802 

91204 

17.3781 

.003311 

.0575 


91809 

17.4069 

.003300 

.0574 

304 

92416 

17.4356 

.003289 

.0574 

306 

93025 

17.4642 

.003279 

.0573 

306 

93636 

17.4929 

.003268 

.0572 


94249 

17^214 

.003257 

.0571 

303 

94864 

17.5499 

.003247 

.0570 

309 

95481 

17.5784 

.003236 

.0569 

310 

96100 

17.6068 

.003226 

.0568 

311 

96721 

17.6352 

.003215 

.0567 

312 

97344 

17.6635 

.003205 

.0566 

313 

97969 

17.6918 

.003195 

.0565 

314 

98596 

17.7200 

.003185 

.0564 

316 

99225 

17.7482 

.003175 

.0563 

316 

99856 

17.7764 

.003165 

.0563 

317 

100489 

17.8045 

.003155 

.0562 

318 

101124 

17.8326 

.003145 

.0561 

310 

101761 

17.8606 

.003135 

.0560 

320 

102400 

17.8885 

.003125 

0559 

321 

103041 

17.9165 

.003115 

.0558 

322 

103684 

17.9444 

.003106 

.0557 

323 

104329 

17.9722 

.003096 

.0556 

324 

104976 

18.0000 

.003086 

.0556 

326 

105625 

18.0278 

.003077 

.0555 

326 

106276 

18.0555 

.003067 

.0554 

327 

106929 

18.0831 

.003058 

.0553 

328 

107584 

18.1108 

.003049 

.0552 

329 

108241 

18.1384 

.003040 

.0551 

330 

108900 

18.1659 

.003030 

,0550 

331 

109561 

18.1934 

.003021 

.0550 

332 

110224 

18.2209 

.003012 

.0549 

333 

110889 

18.2483 

.003003 

.0548 

334 

111556 

18.2757 

.002994 

.0547 

836 

112225 

18.3030 

.002985 

.0546 

336 

112896 

18.3303 

.002976 

.0546 

337 

113569 

18.3576 

.002967 

.0545 

338 

114244 

18.3848 

.002959 

.0544 

339 

114921 

18.4120 

.002950 

.0543 

340 

115600 

18.4391 

.002941 

.0542 

341 

116281 

18.4662 

.002933 

.0542 

342 

116964 

18.4932 

.002924 

.0541 

343 

117649 

18.5203 

.002915 

.0540 

344 

118336 

18.5472 

.002907 

.0539 

346 

119025 

18.5742 

.002899 

.0538 

346 

119716 

18.6011 

.002890 

.0538 

347 

120409 

18.6279 

.002882 

.0537 

348 

121104 

18.6548 

.002874 

.0536 

349 

121801 

18.6815 

.002865 

.0535 

360 

122500 

18.7083 

.002857 

.0535 
































SQUARES, SQUARE ROOTS, AND REOPROCALS 
Table I (confinuad) 


529 


n 

n * 


1 

n 

1 

Vn 

361 

123201 

18^350 

.002849 

.0534 

868 

123904 

18^617 

.002841 

.0533 

363 

124609 

18.7883 

.002833 

.0532 

364 

125316 

18.8149 

.002825 

.0531 

366 

126025 

18.8414 

.002817 

.0531 

366 

126736 

18.8680 

.002809 

.0530 

367 

127449 

18.8944 

.002801 

.0529 

368 

128164 

18.9209 

.002793 

.0529 

369 

128881 

18.9473 

.002786 

.0528 

360 

129600 

18.9737 

.002778 

.0527 

361 

130321 

19.0000 

.002770 

.0526 

362 

131044 

19.0263 

.002762 

.0526 

363 

131769 

19.0526 

.002755 

.0525 

364 

132496 

19.0788 

.002747 

.0524 

365 

133225 

19.1050 

.002740 

.0523 

366 

133956 

19.1311 

.002732 

.0523 

367 

134689 

19.1572 

.002725 

.0522 

368 

135424 

19.1833 

.002717 

.0521 

369 

136161 

19.2094 

.002710 

.0521 

370 

136900 

19.2354 

.002703 

.0520 

371 

137641 

19.2614 

.002695 

.0519 

372 

138384 

19.2873 

.002688 

.0518 

373 

139129 

19.3132 

.002681 

.0518 

374 

139876 

19.3391 

.002674 

.0517 

376 

140625 

19.3649 

.002667 

.0516 

376 

141376 

19.3907 

.002660 

.0516 

377 

142129 

19.4165 

.002653 

.0515 

378 

142884 

19.4422 

.002646 

.0514 

379 

143641 

19.4679 

.002639 

.0514 

380 

144400 

19.4936 

.002632 

.0513 

381 

145161 

19.5192 

.002625 

.0512 

382 

145924 

19.5448 

.002618 

.0512 

383 

146689 

19.5704 

.002611 

.0511 

384 

147456 

19.5959 

.002604 

.0510 

386 

148225 

19.6214 

.002597 

.0510 

386 

148996 

19.6469 

.002591 

.0509 

387 

149769 

19.6723 

.002584 

.0508 

388 

150544 

19.6977 

.002577 

.0508 

389 

151321 

19.7231 

.002571 

.0507 

390 

152100 

19.7484 

.002564 

.0506 

391 

152881 

19.7737 

.002558 

.0506 

392 

153664 

19.7990 

.002551 

.0505 

393 

154449 

19.8242 

.002545 

.0504 

394 

155236 

19.8494 

.002538 

.0504 

396 

156025 

19.8746 

.002532 

.0503 

396 

156816 

19.8997 

.002525 

.0503 

397 

157609 

19.9249 

.002519 

.0502 

398 

158404 

19.9499 

.002513 

.0501 

399 

159201 

19.9750 

.002506 

.0501 

400 

160000 

20 . 0 Q 00 

.002500 

.0500 



530 


APPENDIX C 
Table I (continued} 


n 



1 

n 

1 

va 

401 

160801 

20.0250 

.002494 

.0499 

402 

161604 

20.0499 

.002488 

.0499 

403 

162409 

20.0749 

.002481 

.0498 

404 

163216 

20.0998 

.002475 

.0498 

406 

164025 

20.1246 

.002469 

.0497 

406 

164836 

20.1494 

.002463 

.0496 

407 

165649 

20.1742 

.002457 

.0496 

408 

166464 

20.1990 

.002451 

.0495 

400 

167281 

20.2237 

.002445 

.0494 

410 

168100 

20.2485 

.002439 

.0494 

411 

168921 

20.2731 

.002433 

.0493 

412 

169744 

20.2978 

.002427 

.0493 

413 

170569 

20.3224 

.002421 

.0492 

414 

171396 

20.3470 

.002415 

.0491 

416 

172225 

20,3715 

.002410 

.0491 

416 

173056 

20.3961 

.002404 

.0490 

417 

173889 

20.4206 

.002398 

.0490 

418 

174724 

20.4450 

.002392 

.0489 

419 

175561 

20.4695 

.002387 

.0489 

420 

176400 

20.4939 

.002381 

.0488 

421 

177241 

20.5183 

.002375 

.0487 

422 

178084 

20.5426 

.002370 

.0487 

423 

178929 

20.5670 

.002364 

.0486 

424 

179776 

20.5913 

.002358 

.0486 

426 

180625 

20.6155 

.002353 

.0485 

426 

181476 

20.6398 

.002347 

.0485 

427 

182329 

20.6640 

.002342 

.0484 

428 

183184 

20.6882 

.002336 

.0483 

429 

184041 

20.7123 

.002331 

.0483 

430 

184900 

20 J ^364 

.002326 

.0482 

431 

185761 

207^605 

.002320 

.0482 

432 

186624 

20.7846 

.002315 

.0481 

483 

187489 

20.8087 

.002309 

.0481 

434 

188356 

20.8327 

.002304 

.0480 

436 

189225 

20.8567 

.002299 

.0479 

436 

190096 

20.8806 

.002294 

.0479 

437 

190969 

20.9045 

.002288 

.0478 

438 

191844 

20.9284 

.002283 

.0478 

439 

192721 

20.9523 

.002278 

.0477 

440 

193600 

20.9762 

.002273 

.0477 

441 

194481 

21.0000 

.002268 

.0476 

442 

195364 

21.0238 

.002262 

.0476 

448 

196249 

21.0476 

.002257 

.0475 

444 

197136 

21.0713 

.002252 

.0475 

446 

198025 

21.0950 

.002247 

.0474 

446 

198916 

21.1187 

.002242 

.0474 

447 

199809 

21,1424 

.002237 

.0473 

448 

200704 

21.1660 

.002232 

.0472 

440 

201601 

21.1896 

.002227 

.0472 

460 

202500 

217132 

.002222 

.0471 





SQUARES, SQUARE ROOTS, AND RECIPROCALS 
Table I (confintied) 


531 


n 

n> 

Vn 

1 

n 

1 

Vn 

461 

203401 

21.2368 

.002217 

.0471 

452 

204304 

21.2603 

.002212 

.0470 

463 

206209 

21.2838 

.002208 

.0470 

464 

206116 

21.3073 

.002203 

.0469 

466 

207025 

21.3307 

.002198 

.0469 

466 

207936 

21.3542 

.002193 

.0468 

467 

208849 

21.3776 

.002188 

.0468 

468 

209764 

21.4009 

.002183 

.0467 

469 

210681 

21.4243 

.002179 

.0467 

460 

211600 

21.4476 

.002174 

.0466 

461 

213521 

21.4709 

.002169 

.0466 

462 

213444 

21.4942 

.002165 

.0465 

463 

214369 

21.5174 

.002160 

.0465 

464 

215296 

21.5407 

.002155 

.0464 

466 

216225 

21.5639 

.002151 

.0464 

466 

217156 

21.5870 

.002146 

.0463 

467 

218089 

21.6102 

.002141 

.0463 

468 

219024 

21.6333 

.002137 

.0462 

469 

219961 

21.6564 

.002132 

.0462 

470 

220900 

21.6795 

.002128 

.0461 

471 

221841 

21.7025 

.002123 

.0461 

472 

222784 

21.7256 

,002119 

.0460 

473 

223729 

21.7486 

.002114 

.0460 

474 

224676 

21.7715 

.002110 

.0459 

476 

225625 

21.7945 

.002105 

.0459 

476 

226576 

21.8174 

.002101 

.0458 

477 

227529 

21.8403 

.002096 

.0458 

478 

228484 

21.8632 

.002092 

.0457 

479 

229441 

21.8861 

.002088 

.0457 

480 

230400 

21.9089 

.002083 

.0456 

481 

231361 

21.9317 

.002079 

.0456 

482 

232324 

21.9545 

.002075 

.0455 

483 

233289 

21.9773 

.002070 

.0455 

484 

234256 

22.0000 

.002066 

.0455 

486 

235225 

22.0227 

.002062 

.0454 

486 

236196 

22.0454 

.002058 

.0454 

487 

237169 

22.0681 

.002053 

.0453 

488 

238144 

22.0907 

.002049 

.0453 

489 

239121 

22.1133 

.002045 

.0452 

490 

240100 

22.1359 

.002041 

.0452 

491 

241081 

22.1585 

.002037 

.0451 

492 

242064 

22.1811 

.002033 

.0451 

493 

243049 

22.2036 

.002028 

.0450 

494 

244036 

22.2261 

.002024 

.0450 

495 

245025 

22.2486 

.002020 

.0449 

496 

246016 

22.2711 

.002016 

.0448 

497 

247009 

22.2935 

.002012 

.0449 

498 

248004 

22.3159 

.002008 

.0449 

499 

249001 

22.3383 

.002004 

.0448 

600 

250000 

22.3607 

.002000 

.0447 


532 


APPENDIX C 
Tabl« I (continued) 


n 



1 

n 

1 

va 

501 

251001 

22.3830 

.001996 

.0447 

600 

252004 

22.4054 

.001992 

.0446 

503 

253009 

22.4277 

.001988 . 

.0446 

504 

254016 

22.4499 

.001984 

.0445 

505 

255025 

22.4722 

.001980 

.0445 

506 

256036 

22.4944 

.001976 

.0445 

507 

257049 

12.5167 

.001972 

.0444 

506 

258064 

22.5389 

.001969 

.0444 

500 

259081 

22.5610 

.001965 

.0443 

610 

260100 

22.5832 

.001961 

.0443 

511 

261121 

i 2.6053 

.001957 

.0442 

510 

262144 

22.6274 

.001953 

.0442 

513 

263169 

22.6495 

.001949 

.0442 

514 

264196 

22.6716 

.001946 

.0441 

515 

265225 

22.6936 

.001942 

.0441 

516 

266256 

22.7156 

.001938 

.0440 

517 

267289 

i ? 2.7376 

.001934 

.0440 

518 

268324 

22.7596 

.001931 

.0439 

519 

269361 

22.7816 

.001927 

.0439 

500 

270400 

22.8035 

.001923 

.0439 

501 

271441 

22.8254 

.001919 

.0438 

500 

272484 

22.8473 

.001916 

.0438 

503 

273529 

22.8692 

.001912 

.0437 

504 

274576 

22.8910 

.001908 

.0437 

505 

275625 

22.9129 

.001905 

.0436 

506 

276676 

22.9347 

.001901 

.0436 

507 

277729 

22.9565 

.001898 

.0436 

508 

278784 

22.9783 

.001894 

.0435 

509 

279841 

23.0000 

.001890 

.0435 

530 

280900 

23.0217 

.001887 

.0434 

531 

281961 

23.0434 

.001883 

.0434 

530 

283024 

23.0651 

.001880 

.0434 

533 

284089 

23.0868 

.001876 

.0433 

534 

285156 

23.1084 

.001873 

.0433 

535 

286225 

23.1301 

.001869 

.0432 

536 

287296 

23.1517 

.001866 

.0432 

537 

288369 

23.1733 

.001862 

.0432 

538 

289444 

23.1948 

.001859 

.0431 

539 

290521 

23.2164 

.001855 

.0431 

540 

291600 

23.2379 

.001852 

.0430 

541 

292681 

23.2594 

.001848 

.0430 

540 

293764 

23.2809 

.001845 

.0430 

543 

294849 

23.3024 

.001842 

.0429 

544 

295936 

23.3238 

.001838 

.0429 

545 

297025 

23.3452 

.001835 

.0428 

546 

298116 

23.3666 

.001832 

.0428 

547 

299209 

23.3880 

.001828 

.0428 

548 

300304 

23.4094 

.001825 

.0427 

549 

301401 

23.4307 

.001821 

.0427 

550 

302500 

23.4521 

.001818 

.0426 





SQUARES, SQUARE ROOTS, AND RECIPROCALS 
Table I (conf/nuad) 


533 


n 

n* 


1 

n 

1 

va 

561 

303601 

23.4734 

.001815 

.0426 

662 

304704 

23.4947 

.001812 

.0426 

663 

305809 

23.5160 

.001808 

.0425 

664 

306916 

23.5372 

.001805 

.0425 

666 

308025 

23.5584 

.001802 

.0424 

666 

309136 

23.5797 

.001799 

.0424 

667 

310249 

23.6008 

.001795 

.0424 

668 

311364 

23.6220 

.001792 

.0423 

669 

312481 

23.6432 

.001789 

.0423 

660 

313600 

23.6643 

.001786 

.0423 

661 

314721 

23.6854 

.001783 

.0422 

662 

315844 

23.7065 

.001779 

.0422 

668 

316969 

23.7276 

.001776 

.0421 

664 

318096 

23.7487 

.001773 

.0421 

066 

319225 

23.7697 

.001770 

.0421 

666 

320356 

23.7908 

.001767 

.0420 

667 

321489 

23.8118 

.001764 

.0420 

668 

322624 

23.8328 

.001761 

.0420 

669 

323761 

23.8537 

.001757 

.0419 

670 

324900 

23.8747 

.001754 

.0419 

671 

326041 

23.8956 

.001751 

.0418 

672 

327184 

23.9165 

.001748 

.0418 

678 

328329 

23.9374 

.001745 

.0418 

674 

329476 

23.9583 

.001742 

.0417 

676 

330625 

23.9792 

.001739 

.0417 

676 

331776 

24.0000 

.001736 

.0417 

677 

332929 

24.0208 

.001733 

.0416 

678 

334084 

24.0416 

.001730 

.0416 

679 

335241 

24.0624 

.001727 

.0416 

680 

336400 

24.0832 

.001724 

.0415 

681 

337561 

24.1039 

.001721 

.0415 

682 

338724 

24.1247 

.001718 

.0415 

683 

339889 

24.1454 

.001715 

.0414 

684 

341056 

24.1661 

.001712 

.0414 

686 

342225 

24.1868 

.001709 

.0413 

686 

343396 

24.2074 

.001706 

.0413 

687 

344569 

24.2281 

.001704 

.0413 

688 

345744 

24.2487 

.001701 

.0412 

689 

346921 

24.2693 

.001698 

.0412 

690 

348100 

24.2899 

.001695 

.0412 

691 

349281 

24.3105 

.001692 

.0411 

692 

350464 

24.3311 

.001689 

.0411 

593 

351649 

24.3516 

.001686 

.0411 

. 694 

352836 

24.3721 

.001684 

.0410 

696 

354025 

24.3926 

.001681 

.0410 

696 

355216 

24.4131 

.001678 

.0410 

697 

356409 

24.4336 

.001675 

.0409 

698 

357604 

24.4540 

.001672 

.0409 

699 

358801 

24.4745 

.001669 

.0409 

600 

360000 

24.4949 

.001667 

.0408 



534 


APPENDIX C 
Table I (continued) 


n 

n » 

ViT 

1 

n 

1 

v; 

601 

361201 

24.5153 

.001664 

.0408 

602 

362404 

24.5357 

.001661 

.0408 

603 

363609 

24.5561 

.001658 

.0407 

604 

364816 

24.5764 

4)01656 

.0407 

606 

366025 

24.5967 

.001653 

.0407 

606 

367236 

24.6171 

.001650 

.0406 

607 

368449 

24.6374 

.001647 

.0406 

608 

369664 

24.6577 

.001645 

.0406 

600 

370881 

24.6779 

.001642 

.0405 

610 

372100 

24.6982 

.001639 

.0405 

611 

373321 

24.7184 

.001637 

.0405 

612 

374544 

24.7386 

.001634 

.0404 

613 

375769 

24.7588 

.001631 

.0404 

614 

376996 

24.7790 

.001629 

.0404 

615 

378225 

24.7992 

.001626 

.0403 

616 

379456 

24.8193 

.001623 

.0403 

617 

380689 

24.8395 

.001621 

.0403 

618 

381924 

24.8596 

.001618 

.0402 

619 

383161 

24.8797 

.001616 

.0402 

620 

384400 

24.8998 

.001613 

.0402 

621 

385641 

24.9199 

.001610 

.0401 

622 

386884 

24.9399 

.001608 

.0401 

623 

388129 

24.9600 

.001605 

.0401 

624 

389376 

24.9800 

.001603 

.0400 

625 

390625 

25.0000 

.001600 

.0400 

626 

391876 

25.0200 

.001597 

.0400 

627 

393129 

25.0400 

.001595 

.0399 

628 

394384 

25.0599 

.001592 

.0399 

629 

395641 

25.0799 

.001590 

.0399 

630 

396900 

25.0998 

.001587 

.0398 

631 

398161 

25.1197 

.001585 

.0398 

632 

399424 

25.1396 

.001582 

.0398 

633 

400689 

25.1595 

.001580 

.0397 

634 

401956 

25.1794 

.001577 

.0397 

635 

403225 

25.1992 

.001575 

,0397 

636 

404496 

25.2190 

.001572 

.0397 

637 

405769 

25.2389 

.001570 

.0396 

638 

407044 

25.2587 

.001567 

.0396 

639 

408321 

25.2784 

.001565 

.0396 

640 

409600 

25.2982 

.001563 

.0395 

641 

410881 

25.3180 

.001560 

.0395 

642 

412164 

25,3377 

.001558 

.0395 

643 

413449 

25.3574 

.001555 

.0394 

644 

414736 

25.3772 

.001553 

.0394 

645 

416025 

25.3969 

.001550 

.0394 

646 

417316 

25^4165 

.001548 

.0393 

647 

418609 

25.4362 

.001546 

.0393 

648 

419904 

25.4558 

.001543 

,0393 

649 

421201 

25.4755 

.001541 

.0393 

650 

422500 

25.4951 

.001538 

.0392 





SQUARES, SQUARE ROOTS, AND RECIPROCALS 
Table I (confinued) 


423801 

25.5147 

.001536 

425104 

25.5343 

.001534 

426409 

25.5539 

.001531 

427716 

25.5734 

.001529 

429025 

25.5930 

.001527 

430336 

25.6125 

.001524 

431649 

25.6320 

.001522 

432964 

25.6515 

.001520 

434281 

25.6710 

.001517 

435600 

25.6905 

.001515 

436921 

25.7099 

.001513 

438244 

25.7294 

.001511 

439569 

25.7488 

.001508 

440896 

25.7682 

.001506 

442225 

25.7876 

.001504 

443556 

25.8070 

.001502 

444889 

25.8263 

.001499 

446224 

25.8457 

.001497 

447561 

25.8650 

.001495 

448900 

25.8844 

.001493 

450241 

25.9037 

.001490 

451584 

25.9230 

.001488 

452929 

25.9422 

.001486 

454276 

25.9615 

.001484 

455625 

25.9808 

.001481 

456976 

26.0000 

.001479 

458329 

26.0192 

.001477 

459684 

26.0384 

.001475 

461041 

26.0576 

.001473 

462400 

26.0768 

.001471 

463761 

26.0960 

.001468 

465124 

26.1151 

.001466 

466489 

26.1343 

.001464 

467856 

26.1534 

.001462 

469225 

26.1725 

.001460 

470596 

26.1916 

.001458 

471969 

26.2107 

.001456 

473344 

26.2298 

.001453 

474721 

26.2488 

.001451 

476100 

26.2679 

.001449 

477481 

26.2869 

.001447 

478864 

26.3059 

.001445 

480249 

26.3249 

.001443 

481636 

26.3439 

.001441 

483025 

26.3629 

.001439 

484416 

26.3818 

.001437 

485809 

26.4008 

.001435 

487204 

26.4197 

.001433 

488601 

26.4386 

.001431 

490000 

26.4575 

.001429 


700 



































536 


APPENDIX C 


Table I (eonNnuod) 


Vi- 


1 _i_ 

" Vn 


701 

491401 

26.4764 

.001427 

.037 

702 

492804 

26.4953 

.001425 

.037 

708 

494209 

26.5141 

.001422 

.037 

704 

495616 

26.5330 

.001420 

.037 

708 

497025 

26.5518 

.001418 

.037 

706 

498436 

26.5707 

.001416 

.037 

707 

499849 

26.5895 

.001414 

.037 

708 

501264 

26.6083 

.001412 

.037 

709 

502681 

26.6271 

.001410 

.037 

710 

504100 

26.6458 

.001408 

.037 

711 

505521 

26.6646 

.001406 

.037 

712 

506944 

26.6833 

.001404 

.037 

718 

508369 

26.7021 

.001403 

.037 

714 

509796 

26.7208 

.001401 

.037 

716 

511225 

26.7395 

.001399 

.037 

716 

512656 

26.7582 

.001397 

.037 

717 

514089 

26.7769 

.001395 

.037 

718 

515524 

26.7955 

.001393 

.037 

719 

516961 

26.8142 

.001391 

.037 

720 

518400 

26.8328 

.001389 

.037 

721 

519841 

26.8514 

.001387 

.037 

722 

521284 

26.8701 

.001385 

.037 

728 

522729 

26.8887 

.001383 

.037 

724 

524176 

26.9072 

.001381 

.037 

726 

525625 

26.9258 

.001379 

.037 

726 

527076 

26.9444 

.001377 

.037 

727 

528529 

26.9629 

.001376 

.037 

728 

529984 

26.9815 

.001374 

.037 

729 

531441 

27.0000 

.001372 

.0371 

780 

532900 

27.0185 

.001370 

.0371 

781 

534361 

27.0370 

.001368 

.0371 

782 

535824 

27.0555 

.001366 

.0371 

738 

537289 

27.0740 

.001364 

.036 

734 

538756 

27.0924 

.001362 

.036 

786 

540225 

27.1109 

.001361 

.036 

786 

541696 

27.1293 

.001359 

.036 

787 

543169 

27.1477 

.001357 

.0361 

788 

544644 

27.1662 

.001355 

.036 

739 

546121 

27.1846 

.001353 

.0361 

740 

547600 

27.2029 

.001351 

.0361 

741 

549081 

27.2213 

.001350 

.036 

742 

550564 

27.2397 

.001348 

.036 

748 

552049 

27.2580 

.001346 

.036 

744 

553536 

27.2764 

.001344 

.036 

746 

555025 

27.2947 

.001342 

.036 

746 

556516 

27.3130 

.001340 

.036 

747 

558009 

27.3313 

.001339 

.036 

748 

559504 

27.3496 

.001337 

.036 

749 

561001 

27.3679 

.001335 

.036 

760 

562500 

27.3861 

.001333 

.036 


MlOlOkO^'O ChNNNN QB OB OD OD O S} S> O O O O-• KB N> KB KB CB U> CB CB ^ ^ ^ Ol Ol Ul Ol <K <K N N N N OB 





































SQUARES, SQUARE ROOTS, AND RECIPROCALS 
Table I (conf/nued) 


537 


n 

n * 


1 

n 

1 

Vn 

761 

564001 

27.4044 

.001332 

.0365 

762 

565504 

27.4226 

.001330 

4)365 

763 

567009 

27.4408 

.001328 

4)364 

764 

568516 

27.4591 

.001326 

4)364 

766 

570025 

27.4773 

.001325 

4)364 

766 

571536 

27.4955 

.001323 

.0364 

767 

573049 

27.5136 

.001321 

.0363 

768 

574564 

27.5318 

.001319 

.0363 

769 

576081 

27.5500 

.001318 

4)363 

760 

577600 

27.5681 

.001316 

.0363 

761 

579121 

27.5862 

.001314 

.0363 

762 

580644 

27.6043 

.001312 

.0362 

763 

582169 

27.6225 

.001311 

.0362 

764 

583696 

27.6405 

.001309 

.0362 

766 

585225 

27.6586 

.001307 

.0362 

766 

586756 

27,6767 

.001305 

.0361 

767 

588289 

27.6948 

.001304 

.0361 

768 

589824 

27.7128 

.001302 

4)361 

769 

591361 

27.7308 

. 001300 

4)361 

770 

592900 

27 JA 69 

.001299 

4)360 

771 

594441 

27.7669 

.001297 

4)360 

772 

595984 

27.7849 

.001295 

.0360 

773 

597529 

27.8029 

.001294 

.0360 

774 

599076 

27.8209 

.001292 

.0359 

776 

600625 

27.8388 

.001290 

4)359 

776 

602176 

27.8568 

.001289 

.0359 

777 

603729 

27.8747 

.001287 

.0359 

778 

605284 

27.8927 

.001285 

.0359 

779 

606841 

27.9106 

.001284 

.0358 


608400 

27.9285 

.001282 

.0358 

781 

609961 

27.9464 

.001280 

.0358 

782 

611524 

27.9643 

.001279 

.0358 

783 

613089 

27.9821 

.001277 

.0357 

784 

614656 

28.0000 

.001276 

.0357 

785 

616225 

28.0179 

.001274 

.0357 

786 

617796 

28.0357 

.001272 

.0357 

787 

619369 

28.0535 

.001271 

.0356 

788 

620944 

28.0713 

.001269 

.0356 

789 

622521 

28.0891 

.001267 

.0356 


624100 

28.1069 

.001266 

.0356 

791 

625681 

28.1247 

.001264 

.0356 

792 

627264 

28.1425 

.001263 

.0355 

793 

628849 

28.1603 

.001261 

.0355 

794 

630436 

28.1780 

.001259 

.0355 

795 

632025 

28.1957 

.001258 

.0355 

796 

633616 

28.2135 

.001256 

.0354 

797 

635209 

28.2312 

.001255 

.0354 

798 

636804 

28.2489 

.001253 

.0354 

799 

638401 

28.2666 

.001252 

.0354 

800 

640000 

28.2843 

.001250 

.0354 


















538 


APPENDIX C 
Table I {continued) 


n 

n * 


1 

n 

1 

Vn 

801 

641601 

28.3019 

.001248 

.0353 

808 

643204 

28.3196 

.001247 

.0353 

808 

644809 

28.3373 

.001245 

.0353 

804 

646416 

28.3549 

.001244 

.0353 

806 

648025 

28.3725 

.001242 

.0352 

806 

649636 

28.3901 

.001241 

.0352 

807 

651249 

28.4077 

.001239 

.0352 

808 

652864 

28.4253 

.001238 

.0352 

809 

654481 

28.4429 

.001236 

.0352 

810 

656100 

28.4605 

.001235 

.0351 

811 

657721 

28.4781 

.001233 

.0351 

812 

659344 

28.4956 

.001232 

.0351 

818 

660969 

28.5132 

.001230 

.0351 

814 

662596 

28.5307 

.001229 

.0351 

816 

664225 

28.5482 

.001227 

.0350 

816 

665856 

28.5657 

.001225 

.0350 

817 

667489 

28.5832 

.001224 

.0350 

818 

669124 

28.6007 

.001222 

.0350 

819 

670761 

28.6182 

.001221 

.0349 

820 

672400 

28.6356 

.001220 

.0349 

821 

674041 

28.6531 

.001218 

.0349 

822 

675684 

28.6705 

.001217 

.0349 

828 

677329 

28.6880 

.001215 

.0349 

824 

678976 

28.7054 

.001214 

.0348 

826 

680625 

28.7228 

.001212 

.0348 

826 

682276 

28.7402 

.001211 

.0348 

827 

683929 

28.7576 

.001209 

.0348 

828 

685584 

28.7750 

.001208 

.0348 

829 

687241 

28.7924 

.001206 

.0347 

880 

688900 

28.8097 

.001205 

.0347 

881 

690561 

28.8271 

.001203 

.0347 

882 

692224 

28.8444 

.001202 

.0347 

888 

693889 

28.8617 

.001200 

.0346 

884 

695556 

28.8791 

.001199 

.0346 

886 

697225 

28.8964 

.001198 

.0346 

886 

698896 

28.9137 

.001196 

.0346 

887 

700569 

28.9310 

.001195 

.0346 

888 

702244 

28.9482 

.001193 

.0345 

889 

703921 

28.9655 

.001192 

.0345 

840 

705600 

28.9828 

.001190 

.0345 

841 

707281 

29.0000 

.001189 

.0345 

842 

708964 

29.0172 

.001188 

.0345 

848 

710649 

29.0345 

.001186 

.0344 

844 

712336 

29.0517 

.001185 

.0344 

846 

714025 

29.0689 

.001183 

.0344 

846 

715716 

29.0861 

.001182 

.0344 

847 

717409 

29.1033 

.001181 

.0344 

848 

719104 

29.1204 

.001179 

.0343 

849 

720801 

29.1376 

.001178 

.0343 

860 

722500 

29.1548 

.001176 

.0343 




SQUARES, SQUARE ROOTS, AND RECIPROCALS 
Table I (continued) 


539 


n 

n * 

ViT 

1 

n 

1 

va 

851 

724201 

29.1719 

.001175 

.0343 

852 

725904 

29.1890 

.001174 

.0343 

858 

727609 

29.2062 

.001172 

.0342 

854 

729316 

29.2233 

.001171 

.0342 

855 

731025 

29.2404 

.001170 

.0342 

868 

732736 

29.2575 

.001168 

.0342 

857 

734449 

29.2746 

.001167 

.0342 

858 

736164 

29.2916 

.001166 

.0341 

859 

737881 

29.3087 

.001164 

.0341 

860 

739600 

29.3258 

.001163 

.0341 

861 

741321 

29.3428 

.001161 

.0341 

862 

743044 

29.3598 

.001160 

.0341 

863 

744769 

29.3769 

.001159 

.0340 

864 

746496 

29.3939 

.001157 

.0340 

865 

748225 

29.4109 

.001156 

.0340 

866 

749956 

29.4279 

.001155 

.0340 

867 

751689 

29.4449 

.001153 

.0340 

868 

753424 

29.4618 

.001152 

.0339 

869 

755161 

29.4788 

.001151 

.0339 

870 

756900 

29.4958 

.001149 

.0339 

871 

758641 

29.5127 

.001148 

.0339 

872 

760384 

29.5296 

.001147 

.0339 

873 

762129 

29.5466 

.001145 

.0338 

874 

763876 

29.5635 

.001144 

.0338 

875 

765625 

29.5804 

.001143 

.0338 

876 

767376 

29.5973 

.001142 

.0338 

877 

769129 

29.6142 

.001140 

.0338 

878 

770884 

29.6311 

.001139 

.0337 

879 

772641 

29.6479 

.001138 

.0337 

880 

774400 

29.6648 

.001136 

.0337 

881 

776161 

29.6816 

.001135 

.0337 

882 

777924 

29.6985 

.001134 

.0337 

883 

779689 

29.7153 

.001133 

.0337 

884 

781456 

29.7321 

.001131 

.0336 

885 

783225 

29^489 

.001130 

.0336 

886 

784996 

29.7658 

.001129 

.0336 

887 

786769 

29.7825 

.001127 

.0336 

888 

788544 

29.7993 

.001126 

.0336 

889 

790321 

29.8161 

.001125 

4)335 

890 

792100 

29.8329 

.001124 

4)335 

891 

793881 

29.8496 

.001122 

.0335 

892 

795664 

29.8664 

.001121 

.0335 

893 

797449 

29.8831 

.001120 

.0335 

894 

799236 

29.8998 

.001119 

.0334 

895 

801025 

29.9166 

.001117 

.0334 

896 

802816 

29.9333 

.001116 

.0334 

897 

604609 

29.9500 

.001115 

.0334 

898 

806404 

29.9666 

.001114 

.0334 

899 

808201 

29.9833 

.001112 

.0334 

900 

810000 

30.0000 

.001111 

.0333 




540 


APPENDIX C 
Table I leontinutd) 


n 

n > 

v7 

1 

n 

1 

Vi 


811801 

30.0167 

.001110 

.0333 


813604 

30.0333 

.001109 

.0333 


815409 

30.0500 

.001107 

.0333 


817216 

30.0666 

.001106 

.0333 


819025 

30.0832 

.001105 

.0332 


820836 

30.0998 

.001104 

.0332 


822649 

30.1164 

.001103 

.0332 


824464 

30.1330 

.001101 

.0332 


826281 

30.1496 

.001100 

.0332 

910 

828100 

30.1662 

.001099 

.0331 

911 

829921 

30.1828 

.001098 

.0331 

912 

831744 

30.1993 

.001096 

.0331 

918 

833569 

30.2159 

.001095 

.0331 

914 

835396 

30.2324 

.001094 

.0331 

916 

837225 

30.2490 

.001093 


916 

839056 

30.2655 

.001092 

.0330 

917 

840889 

30.2820 

.001091 


918 

842724 

30.2985 

.001089 

.0330 

919 

844561 

30.3150 

.001088 

.0330 

920 

846400 

30.3315 

.001087 


921 

848241 

30.3480 

.001086 

.0330 

922 

850084 

30.3645 

.001085 

.0329 

928 

851929 

30.3809 

.001083 


924 

853776 

30.3974 

.001082 

.0329 

926 

855625 

30.4138 

.001081 

.0329 

926 

857476 

30^1302 

.001080 


927 

859329 

30.4467 

.001079 

.0328 

928 

861184 

30.4631 

.001078 

.0328 

929 

863041 

30.4795 

.001076 

.0328 

980 

864900 

30.4959 

.001075 

.0328 

981 

866761 

30.5123 

.001074 

.0328 

982 

868624 

30.5287 

.001073 

.0328 

988 

870489 

30.5450 

.001072 


984 

872356 

30.5614 

.001071 

.0327 

985 

874225 

30.5778 

.001070 

.0327 

986 

876096 

30.5941 

.001068 

.0327 

987 

877969 

30.6105 

.001067 

.0327 

988 

879844 

30.6268 

.001066 


989 

881721 

30.6431 

.001065 

.0326 

940 

883600 

30.6594 

.001064 

.0326 

941 

885481 

30.6757 

.001063 

.0326 

942 

887364 

30.6920 

.001062 

.0326 

948 

889249 

30.7083 

.001060 

.0326 

944 

891136 

30.7246 

.001059 

.0325 

945 

893025 

30.7409 

.001058 

.0325 

946 

894916 

30^^571 

.001057 


947 

896809 

30.7734 

.001056 

.0325 

948 

898704 

30^^896 

.001055 

.0325 

949 

900601 

30.8058 

.001054 

.0325 

950 

902500 

30.8221 

.001053 

.0324 

















































SQUARES, SQUARE ROOTS, AND RECIPROCALS 
Table I (conl/nuecl) 


541 


n 

n * 

Vn 

1 

n 

JL 

y/it 

951 

904401 

30.8383 

.001052 

.0324 

952 

906304 

30.8545 

.001050 

.0324 

958 

908209 

30.8707 

.001049 

.0324 

954 

910116 

30.8869 

.001048 

.0324 

955 

912025 

30.9031 

.001047 

.0324 

956 

913936 

30.9192 

.001046 

.0323 

957 

915849 

30.9354 

.001045 

.0323 

958 

917764 

30.9516 

.001044 

.0323 

959 

919681 

30.9677 

.001043 

.0323 

960 

921600 

30.9839 

.001042 

.0323 

961 

923521 

31.0000 

.001041 

.0323 

962 

925444 

31.0161 

.001040 

.0322 

968 

927369 

31.0322 

.001038 

.0322 

964 

929296 

31.0483 

.001037 

.0322 

965 

931225 

31.0644 

.001036 

.0322 

966 

933156 

31.0805 

.001035 

.0322 

967 

935089 

31.0966 

.001034 

.0322 

968 

937024 

31.1127 

.001033 

.0321 

969 

938961 

31.1288 

.001032 

.0321 

970 

940900 

31.1448 

.001031 

.0321 

971 

942841 

31.1609 

.001030 

.0321 

972 

944784 

31.1769 

.001029 

.0321 

978 

946729 

31,1929 

.001028 

.0321 

974 

948676 

31.2090 

.001027 

.0320 

975 

950625 

31.2250 

.001026 

.0320 

976 

952576 

31.2410 

.001025 

.0320 

977 

954529 

31.2570 

.001024 

.0320 

978 

956484 

31.2730 

.001022 

.0320 

979 

958441 

31.2890 

.001021 

.0320 

980 

960400 

31.3050 

.001020 

.0319 

981 

962361 

31.3209 

.001019 

.0319 

982 

964324 

31.3369 

.001018 

.0319 

988 

966289 

31.3528 

.001017 

.0319 

984 

968256 

31.3688 

.001016 

.0319 

986 

970225 

31.3847 

.001015 

0319 

986 

972196 

31.4006 

.001014 

.0318 

987 

974169 

31.4166 

.001013 

.0318 

988 

976144 

31.4325 

.001012 

.0318 

989 

978121 

31.4484 

.001011 

.0318 

990 

980100 

31.4643 

.001010 

.0318 

991 

982081 

31.4802 

.001009 

.0318 

992 

984064 

31.4960 

.001008 

.0318 

998 

986049 

31.5119 

.001007 

.0317 

994 

988036 

31.5278 

.001006 

.0317 

995 

990025 

31.5436 

.001005 

.0317 

996 

992016 

31.5595 

.001004 

.0317 

997 

994009 

31.5753 

.001003 

.0317 

998 

996004 

31.5911 

.001002 

.0317 

999 

998001 

31.6070 

.001001 

.0316 

1000 

1000000 

31.6228 

.001000 

.0316 






Table II 


A TABLE OF RANDOM NUMBERS* 

Locating the Starting Point of a Series of Random Numbers 

The procedure ordinarily employed to locate the first number of a series is simply 
to close one’s eyes, place one’s finger or a pencil on the table, and take the number 
thus pointed to as the first one of the series, f Once the starting point is located, it 
makes no difference to the random character of the series whether successive digits 
of the series are obtained by going across the row, up or down the column, obliquely, 
or in any other direction. If two- or three-place numbers are needed, they can be 
readily obtained by combining adjoining digits of two or three columns or rows. 


* From J. G. Peatman and Roy Schafer, “A Table of Random Numbers from Selective 
Service Numbers,” Journal of Psychology^ 14:295-305, 1942. 

t A more systematic method for locating the initial number of a series may be employed 
if an investigator wishes to indulge in the following “game”: Place a pencil or finger on 
the page without looking at the table. Combine the digit thus obtained with the one im¬ 
mediately above to give a two-place number. If this two-place number is less than 32, use 
it to locate the column. If it is greater than 32, combine the initial digit with those around 
it in clockwise or counterclockwise order, continuing until a two-place number of 32 or 
less is obtained. Repeat the procedure in order to locate the row, having in mind that there 
are 50 rows so that the two-place numbers greater than 50 cannot be used. 

To illustrate: With eyes closed, we locate a digit on the table, and we find it to be Digit 4 
of Column 15, Row 25. The digit immediately above this is 9. This, therefore, gives a two- 
place number equal to 94—too large to locate the column. Proceeding in a clockwise direc¬ 
tion, we find the next digit to be 3. However, 34 is still too large. The next one is 9; 94 
again is too large. The next one is 2; 24 then is a two-place number giving us the location 
of the column, namely, the 24th column. 

Again placing the finger on the page with eyes closed, we locate Number 5 of Column 7, 
Row 24. The digit immediately above is 5. Number 55 is too large for the location of the 
row. Proceeding this time counterclockwise, we find the digit to the left to be 1. This gives 
a two-place number equal to 15, and the 15th row is chosen. The initial digit for beginning 
a series of random numbers is thus located as in the 24th column and 15th row. This is 
Number 2. 


543 



544 


APPENDIX C 

Table II (continued) 

A TABLE OF RANDOM NUMBERS 


Row 1 

a 

9 

4 

6 

Column number 

6 7 8 9 

10 

11 

la 

IS 

14 

16 

16 

1 

2 

7 

8 

9 

4 

0 

7 

2 

3 

2 

5 

4 

2 

6 

7 

1 

a 

2 

2 

6 

0 

4 

1 

7 

7 

3 

8 

7 

3 

6 

7 

9 

4 

a 

9 

1 

6 

6 

3 

9 

4 

9 

1 

0 

5 

1 

5 

2 

2 

7 

4 

7 

0 

S 

5 

9 

2 

7 

5 

7 

8 

0 

8 

8 

5 

0 

6 

5 

4 

7 

3 

6 

6 

3 

9 

8 

2 

1 

7 

9 

7 

6 

4 

2 

6 

8 

2 

0 

2 

8 

7 

7 

6 

0 

2 

2 

3 

1 

1 

1 

6 

7 

0 

8 

7 

S 

3 

3 

6 

4 

2 

6 

8 

3 

1 

6 

5 

0 

8 

9 

4 

1 

9 

0 

8 

4 

6 

6 

8 

6 

3 

3 

2 

2 

3 

9 

5 

0 

0 

6 

7 

4 

0 

0 

0 

1 

9 

5 

9 

9 

1 

8 

10 

1 

9 

5 

4 

1 

S 

2 

6 

2 

9 

4 

1 

1 

5 

8 

4 

U 

5 

6 

4 

4 

1 

8 

7 

2 

8 

3 

6 

1 

5 

9 

8 

6 

la 

7 

9 

2 

5 

1 

9 

7 

9 

3 

1 

8 

6 

8 

7 

7 

6 

13 

3 

3 

3 

S 

9 

5 

1 

4 

0 

8 

2 

5 

6 

3 

5 

4 

14 

1 

9 

0 

4 

0 

0 

9 

9 

5 

7 

4 

1 

5 

9 

4 

7 

15 

5 

4 

4 

7 

2 

0 

3 

7 

9 

1 

0 

9 

6 

2 

9 

7 

18 

2 

9 

8 

2 

5 

5 

9 

3 

2 

0 

4 

9 

0 

6 

4 

4 

17 

9 

7 

4 

2 

6 

7 

7 

3 

3 

3 

1 

7 

5 

0 

9 

6 

18 

5 

8 

2 

4 

3 

3 

0 

8 

5 

3 

5 

7 

5 

8 

3 

5 

19 

4 

3 

4 

9 

5 

0 

3 

6 

2 

9 

7 

4 

6 

2 

5 

6 

ao 

1 

1 

9 

8 

4 

8 

0 

6 

7 

0 

9 

7 

9 

6 

9 

9 

ai 

6 

9 

1 

8 

3 

3 

7 

5 

9 

6 

6 

7 

7 

6 

0 

4 

aa 

7 

0 

0 

3 

8 

1 

3 

4 

7 

9 

5 

2 

6 

9 

9 

7 

as 

3 

7 

2 

0 

8 

1 

5 

6 

9 

0 

1 

7 

8 

9 

6 

6 

94 

2 

7 

0 

0 

0 

6 

S 

0 

6 

5 

6 

0 

3 

2 

9 

3 

ao 

3 

0 

7 

0 

7 

8 

4 

9 

4 

2 

8 

2 

4 

7 

4 

9 

26 

6 

2 

9 

3 

3 

1 

7 

7 

5 

2 

2 

3 

4 

6 

4 

2 

a? 

5 

4 

9 

2 

1 

4 

8 

5 

7 

0 

9 

6 

4 

7 

2 

1 

28 

0 

3 

7 

0 

1 

7 

3 

8 

0 

3 

6 

2 

3 

1 

0 

9 

29 

9 

3 

6 

6 

2 

2 

0 

9 

7 

2 

3 

9 

2 

8 

7 

3 

90 

2 

9 

5 

6 

9 

9 

5 

6 

9 

8 

2 

8 

0 

0 

4 

4 

91 

8 

5 

7 

2 

9 

2 

6 

5 

9 

3 

9 

7 

1 

8 

3 

5 

92 

8 

4 

5 

7 

7 

9 

9 

5 

1 

4 

5 

5 

0 

9 

5 

3 

93 

8 

7 

9 

8 

1 

8 

4 

1 

4 

3 

7 

7 

0 

9 

1 

9 

94 

7 

3 

2 

5 

1 

8 

6 

3 

2 

8 

5 

8 

6 

9 

3 

4 

36 

8 

9 

9 

0 

1 

8 

8 

8 

9 

5 

7 

5 

0 

4 

1 

1 

96 

0 

2 

9 

7 

8 

8 

1 

7 

6 

1 

6 

7 

6 

4 

2 

5 

97 

0 

S 

2 

3 

2 

3 

8 

1 

8 

8 

1 

6 

2 

3 

0 

7 

98 

2 

2 

6 

8 

1 

6 

9 

6 

2 

6 

7 

9 

1 

7 

8 

0 

99 

0 

7 

8 

4 

9 

5 

8 

8 

0 

7 

2 

1 

8 

1 

7 

5 

40 

4 

8 

0 

7 

0 

5 

9 

9 

4 . 

9 

6 

9 

8 

2 

0 

6 

41 

9 

2 

0 

1 

6 

7 

2 

8 

3 

9 

8 

8 

3 

4 

7 

8 

42 

0 

8 

8 

3 

4 

0 

9 

2 

2 

8 

1 

5 

0 

4 

8 

2 

49 

2 

0 

6 

9 

7 

5 

2 

8 

2 

5 

5 

4 

0 

7 

7 

1 

44 

3 

1 

8 

6 

8 

3 

5 

6 

3 

2 

7 

4 

1 

8 

9 

4 

45 

0 

0 

8 

6 

1 

7 

5 

0 

8 

5 

6 

5 

0 

8 

2 

7 

46 

3 

3 

2 

9 

4 

2 

5 

3 

3 

8 

2 

4 

2 

6 

2 

S 

47 

8 

4 

7 

4 

0 

4 

5 

1 

2 

1 

0 

4 

2 

5 

7 

7 

48 

0 

2 

4 

3 

0 

2 

0 

7 

2 

8 

8 

0 

8 

4 

1 

6 

49 

4 

6 

5 

6 

3 

0 

4 

5 

2 

0 

1 

5 

2 

7 

9 

5 

00 

3 

4 

8 

3 

4 

5 

8 

7 

S 

9 

7 

1 

6 

3 

9 

9 



RANDOM NUMBERS 


545 


Table II (continued) 


Column number 

17 18 19 20 21 22 23 24 26 26 27 28 29 30 31 


6 S 5 

2 1 3 

5 2 5 

0 5 9 

4 9 6 

4 8 5 

0 5 5 

7 4 7 

1 4 7 

4 4 6 

2 2 9 

6 5 0 

6 5 7 

6 4 8 

4 7 6 

2 1 5 

I 1 3 

9 3 4 

9 8 3 

4 0 6 

5 3 4 

3 2 5 

6 0 7 

1 7 2 

6 0 4 

2 4 7 

8 9 7 

5 5 2 

10 7 

8 8 5 

6 6 1 

13 9 

4 6 1 

5 2 6 

6 0 3 

0 5 8 

3 0 1 

2 4 8 

3 0 7 

4 0 7 

4 0 5 

6 2 9 

7 8 6 

5 6 8 

1 1 6 

2 9 0 

9 4 6 

0 2 3 

3 0 2 

0 9 4 


9 1 3 

8 9 0 

3 4 1 

0 5 7 

0 3 6 

2 2 3 

7 8 1 

5 1 5 

4 9 8 

1 8 7 

1 9 0 

3 8 1 

2 6 7 

2 6 4 

1 1 6 

7 3 6 

9 2 1 

5 4 6 

6 1 4 

0 0 5 

5 7 3 

0 2 3 

8 1 9 

2 8 4 

3 8 1 

5 4 4 

6 1 3 

5 9 2 

0 8 9 

7 2 1 

2 1 5 

3 7 8 

3 8 6 

1 9 0 

1 3 0 

3 2 4 

2 6 2 

0 4 7 

4 1 0 

8 1 1 

1 6 8 

2 1 9 

8 5 1 

0 6 4 

3 4 6 

1 3 7 

5 8 3 

5 9 7 

2 1 6 

2 5 8 


5 4 0 

3 4 9 

3 9 5 

4 5 2 

3 5 3 

4 2 2 

0 1 2 

7 6 3 

7 2 4 

8 6 4 

4 8 1 

1 2 4 

8 9 9 

4 18 

1 2 2 

5 5 4 

10 0 

3 9 2 

0 3 5 

9 6 5 

0 6 1 

5 3 9 

6 7 4 

9 0 4 

7 7 0 

4 17 

3 4 6 

0 2 8 

3 8 8 

3 4 9 

5 5 6 

1 4 0 

5 9 2 

6 9 0 

3 5 8 

7 7 2 

6 8 3 

3 3 8 

3 2 0 

4 2 1 

7 8 3 

8 5 3 

3 7 8 

6 4 1 

0 0 9 

6 5 9 

3 8 1 

5 1 3 

1 1 0 

9 5 3 


3 6 6 

0 2 6 

8 13 

0 6 1 

9 9 1 

6 5 2 

9 14 

7 9 4 

3 0 8 

8 7 4 

0 1 3 

7 8 9 

9 8 0 

8 1 5 

9 5 8 

5 7 9 

13 7 

7 1 1 

9 7 1 

14 2 

0 3 0 

7 4 8 

8 9 6 

3 2 4 

9 8 4 

1 6 7 

6 5 9 

7 7 2 

5 3 1 

5 2 6 

1 7 1 

5 4 1 

2 8 1 

5 4 6 

9 2 7 

2 6 2 

7 4 4 

4 4 8 

1 2 8 

6 7 0 

5 4 5 

10 7 

2 7 1 

0 9 1 

4 7 9 

1 4 6 

0 3 7 

6 3 2 

0 9 1 

3 3 6 


7 6 5 

3 0 9 

8 2 9 

6 4 2 

8 5 1 

2 4 9 

3 4 7 

5 5 3 

6 4 2 

4 0 5 

5 3 4 

1 7 5 

9 1 5 

4 3 8 

4 4 8 

6 6 4 

7 3 7 

4 9 1 

8 0 6 

0 4 1 

0 3 5 

9 4 I 

3 6 5 

5 5 1 

6 3 1 

1 2 6 

0 7 0 

0 2 7 

3 1 0 

8 3 6 

5 7 5 

5 4 4 

6 9 0 

8 0 3 

8 8 7 

6 8 6 

3 8 9 

4 3 3 

6 5 9 

7 3 1 

0 4 0 

8 5 3 

9 3 6 

9 8 1 

2 4 8 

0 1 0 

7 7 8 

8 7 5 

6 1 7 

4 5 2 


32 Row 


1 1 
8 2 

2 3 

0 4 

3 3 

6 6 

6 7 

5 8 

7 9 

8 10 

4 

2 
3 
0 

6 

0 
3 
3 
9 
9 

0 

5 
1 
2 
2 

8 

3 
2 
9 

6 

9 
0 
1 
2 
1 

0 
9 
8 

4 
2 

6 
9 

3 

4 

7 

0 
6 

8 
7 
0 


86696 66666 688!$8 86866 86868 86898 88868 




Glossary of Symbols 


(NOTE: The chief symbols in this book and the page on which they are first used 
are listed below,) 

a area from mean of normal distribution, 264 
A Yule’s Coefficient of Association, 91 
A.D, average deviation, 168 

c correction (in computation of mean and a from a guessed mean), 159 
C Contingency Coefficient, 94 

Cc any centile point value (usually written with numerical subscript, thus: 
Cl, C 2 , etc.), 138 
chi-square, 426 

d differences, especially between paired deviations, 248 
D differences between ranks or paired measures in original score form, 248 
D a decile point value or a decile interval (usually written with numerical 
subscript, thus, Z)i, D 29 etc.), 130 
D range, * 130 

d.f. degrees of freedom, 428 
DK's Don’t Knows, 28 ff. 

E index of predictive efficiency, 459 

/ frequencies (number of cases in a class interval or in a group of data), 113 
fh hypothetical frequencies, 95 

G.M. guessed mean, 157 

h hypothetical value of a parameter (usually a subscript), 323 

547 






548 


GLOSSARY OF SYMBOLS 


i size of a class interval, 137 
LQ. intelligence quotient, 54 

k coefficient of alienation, 452 
Ku kurtosis, 392 

L ratio between hypothetical length and actual length of a test, 475 

M arithmetic mean, ISO 
Mdn. median, 139 
Mo mode, 151 

N number of cases or frequencies in a group of data, 51 

He number of cases in a column, 84 

Hi number of cases or frequencies in an interval, 135-136 

Ur number of cases in a row, 84 

N, number of cases in a sample, 323 

Nu number of cases in a universe, 323 

p proportion, 43 
P probability value, 329 

P. E. probable error, 183 

0 phi coefficient of correlation, 92 

<l)r phi correlation coefficient for dichotomized variables, 93 
q proportionate remainder of 1.0 — p, 329 

Q a quartile point value or a quartile interval (usually identified with 
appropriate subscripts, thus: Qu Qs); also used to symbolize the quartile 
deviation, 128 

Qi to Qz inter-quartile range, 128 

Q. D. quartile deviation, 140 

Qn a quintile point value or a quintile interval (usually identified with 
appropriate subscripts, thus: Qriu Qn^y etc.), 130 

r Pearson’s product-moment correlation coefficient, 197 

R multiple correlation coefficient, 482 

p, rho Spearman’s rank-difference correlation coefficient, 254 

m biserial correlation coefficient, 259 

Ti tetrachoric correlation coefficient, 276 

rtri triserial correlation coefficient, 272 

r* coefficient of determination, 489 

s statistic; a sample value (usually a subscript), 323 
S arithmetic summation of a series of measures, 169 
S Standard score, 54 
(T standard deviation, 150 





GLOSSARY OF SYMBOLS 


549 


(with the subscript of a statistic) standard error, 325 
algebraic summation of a series of measures, 151 
Sk skewness, 390 

t test ratio of Test of Significance in small sample theory, 397 
T test ratio of Test of Significance, 343 

T tercile point value or a tercile interval (usually identified with appropriate 
subscripts, thus: Ti, Ta), 130 
Ti to Ta inter-tercile range, 128 
T.D. tercile deviation, 141 

u universe (as subscript), 323 

V any variable, 243 

V Pearson’s Coefficient of Relative Variation, 171 

Vn vigintile point value or a vigintile interval (usually identified by appro¬ 
priate subscripts, thus: Vriu Vn2, etc.), 130 

X deviate value of X from mean; also a variable, 151 
x' deviations in unit interval terms, 157 
X original score, 166 

y deviate value of Y from mean; also a variable, 151; also ordinate, 260 

z z scores (deviations in units of the standard deviation), 177 
z Fisher’s z transformation function for r, 386 




Glossary of Principal Formulas 


(In the case of alternative formulaSy more than one page number is given unless 
they appear on the same page,) 


Alienation, coefficient of, 452 



Arithmetic mean, see Mean 
Association, Coefficient of, 92 
. _ ad — be 
^ ^ ad + be 


Average deviation 
grouped data, 170 


A,D, = 


Sf(x) 

N 


ungrouped data, 169 


A,D. 


S(X - M) S(x) 
N ^^-N- 


standard error of, 380 


(Tad 


0.603<r 

Vn ] 


or in terms of A.D, 


(Tad 


0.756AD 

vw. 


Binomial for any power of n, 338 

(p 4- g)» = P» + 2p(n-l)g + -?lp(«-«)g» 4-1- g» 


551 





GLOSSARY OF PRINCIPAL FORMULAS 


552 
Biserial r, 260 

alternative form, Dunlap’s formula, 260 

point-biserial r, 271 


rpi-frt ' 


^Mp - 


PQ or rpi-bi 




Standard error of, 388, 389 


/pq 




rhi ^ 


PQ 


V ” V ^ DO 

(Tffc. - 7 ==— or for null hypothesis err,.. = — = == —^ 

Gentile, 136 


Cc^Xi^\ 

standard error of, 381, 382 
<r 


ac^ = ~ ^ terms of Q oc^ 


1.483Q pq 


N . 


Chi-square, 426 


■2[S^] 


for Test of Independence, Pearson’s short-cut formula, 441 

,_ N.(.ad - bey 

^ (<* + b)(c -1- d)(6 + <i)(a + c) 

in terms of phi, 443 

Test of Significance, when d.f. > 30, 430 

s-h (V57-V2(d./.) - 1) - 0 

< r . 1.0 

Coefficient of Relative Variation, 171 
M 

standard error of, 418 

V L . »/vY 


T = 


GLOSSARY OF PRINCIPAL FORMUUS 553 

standard error of a difference between Coefficients of Relative Variation, 418 


Contingency coefiicient of correlation 
from chi-square, 443 

c = -f x^) 

Pearson’s coefficient of mean squcure contingency, 95 



Correlation, 5ee Association; Biserial r; Contingency; Phi; Product-moment 
r; etc. 

between correlation coefficients (one array in common), 421 




r2a — ri2ri3(l — ras* — ria^ — T\^ -}■ 2r23ri2ri3) 

2(1 - M 


D range, 140 

D — Cm “■ Cio 

standard error of, 383 

2.279<r . , 3.380(? 

(TjD = — 7 =r- or in terms of Q an - —or in terms of D ao 
vV , vN , 

Degrees of freedom for a Test of Independence, 441 
d./. = (Ac - l)(Bc - 1) 


M9D 

V/V. 


Efficiency index, see Index of predictive efficiency 

Fisher’s z transformation function for r, sec z 
Frequencies, number at mean of normal distribution, 432 

r = = 

(r'VS 2.51a' 

Frequency, standard error of, 375 

CT / 

Hypothetical frequencies for any cell of fourfold table, 439 



Index of predictive efiiciency, 460 


E = 100%(1 - Vl - = 100%(l - k) 



554 


GLOSSARY OF PRINCIPAL FORMULAS 


Kurtosis, 392 

_ (C,, - C„)/2 _ 0^ 
C«, - C.0 D 

standard error of, 392 
.27779 


Mean, arithmetic 
grouped data, 155 

or ^ 
N N 


grouped data, with guessed mean, 159 
M = G.M. + i^^^ or G.M. + ie 

ungrouped data, 154 


standard error of, 376 

(Tjif = ■ - 7 =.: or foi loTge samplcs, 

Vn. -1 6 

standard error of a difference between means of correlated samples, 413 

“ 2rcE(rMc^Mg 

standard error of a difference between means, when variance of samples is 
same, 411 

standard error of a difference between means, when variance of sampling 
distribution is based on average of both samples, 411 




Mean frequency of p events in binomial distribution, 333 
Mf = Ar,p 

Mean, of a series of ranks, 1 to n, 258 



GLOSSARY OF PRINCIPAL FORMUUS 


555 


of two or more groups combiued, 417 
_ MM, + NiMi +■■■ + MM. 
‘ M + M + • • • + M 

Median, 139 

Mdn = X, + I 




standard error of, 382 

1.253(7 . ^ 1.858Q 

(TMdn = —7==^ or m terms of Q <TMdn = —7==^ 

V VN^ 

Mode of a binomial distribution, 351 

Mo — the integer value between N,p — q and N,p -j- q 

Multiple correlation coefficient (two variables with a third), 483 


/rcx^ “h r2r cxT cyfx\ 


regression equation (three-variable problem), 484 


Ze = 


^cx r cyr xy I ^cy f*exf* xy 

2x “T ‘ 


1 — r ^ 1 — r 2 

1. rxy ^ I xy 

Normal probability function in terms of (7, 184 

If! 

2 «f* 


N 5^ 
y = ——e 


(7V^27r 


Original score values 
from Standard scores, 187 

X = Mx — 5 . 0 ( 7 * 4- Scx 
from z scores, 215 

X^ ^ ZxO'x 4” •^If* 

standard error of, see Standard error of a measure 
Partial correlation coefficient (three-variable problem), 486 

_ _ f*ey /*cx/* xy _ 

vr^vi^r;ri 

Pearson r, see Product-moment coefficient of correlation 
Percentage 

standard error of, 373 




556 


GLOSSARY OF PRINCIPAL FORMUUS 


standard error of a difference between percentages, correlated samples, 407 

Standard error of a difference between percentages, non-correlated samples, 
404 




= 100./m* 

\ N, 


PvQv 

' X 


Phi coefficient of correlation 
for dichotomized non-variable attributes, 92 
6c — ad 


<t> = 


V(a -h 6)(c -f d)(a 4- c)(6 d) 
for dichotomized variate with true dichotomy, 93 


<t> = 




.798 

for dichotomized variates, 93 


<t»r — 


.637 


in terms of chi-square, 443 


Point-biserial r, see Biserial r 
Probability 

of joint occurrence of independent events, 336 

P (a*6*e...fi) ~ Pa • Pb ‘Pc** • • * Pn 

of occurrence of disjunctive events, 336 
P(o+6+c+...+n) = Pa + Pb + Pe + • • • + Pw 

ratio, 329 

p=^ 

Probable error of any statistic for normal sampling distributions of large 
sample theory, 393 

P.E.. = .674Sff. 

See also pp. 394-395 for PJS. formulas of commonly used statistics 



GLOSSARY OF PRINCIPAL FORMULAS 


557 


Productrmoment coefficient of correlation (Pearson r) 
grouped data, with guessed meeuis, 229 



ungrouped data, deviations, 226, 227 

r ^ _ S(a:y) 2(ay) 

** ffxffy Nctffy ^ 1^ VSx'V'Sy* 

ungrouped data, original scores, 237, 238 

z score form, 224 

_ 2 ( 2 , 2 ,) 

'•*v ^ 

standard error of, 384 

1 ta* 

VW. 


standard error of a difference between product-moment r’s, dependent 
samples with no array in common, 420 

standard error of a difference between product-moment r’s, dependent 
samples with one array in common, 420 

Standard error of a difference between product-moment r’s, non-correlated 
samples, 419 

Prophecy formula, 475 

_ Lr^xf 

1 + (L - l)r.,- 




558 


GLOSSARY OF PRINCIPAL FORMULAS 


Proportion 

standard error of, 375 


standard error of a difference between proportions, non-correlated samples, 
404 


^ 2 _L ^ 2 - 

~\1^ "Ny 

Quadriserial correlation coefficient, 273 

ykMh + (yd — yh)Md + {yt — yd)Mh — y^Aff 


rgtiad — 


>** , (yj - y*)* . (yt - VdY . y6’*1 




Quartile, standard error of Q\ and Qz, 382 


crgi = (TCjy 


“ 1.362<r 


^Qz = 


or in terms of Q or > = 


2.020Q 


Quartile deviation, 140 


O.D. = 


Qa ~ Qi _ fts ~~ Czi 


standard error of, 383 

.787<r . ^ l.ieiQ 

(To = — 7 ==- or in terms of Q erg = — 7 =^ 

Viv. 

Quintiserial correlation coefficient, 275 

yhMh + {yd — yh)Md 4- (y^ — yd)Mc + (y& — yc)Mz - yhMi 


rgutnf — 


y** . (y< - y*)* , (yc - ydT , (y6 - y^Y , y/ 




r, see Product-moment coefficient of correlation 
r by method of differences, 248 

- — or when M’s and <t’s are equal r = 1 

ZCxVy 

r by method of sums, 247 


Rank-difference correlation coefficient, 255 
6S(D») 

^ “ N{N^ - 1) 



GLOSSARY OF PRINCIPAL FORMULAS 


standard error of, 388 


<^p 


(1 - PH^) 

vw. 


or for null hypothesis 




1 

vw. 


Regression coefficient 
X on y, 223 


fexy — r»j 


cry 


y on X, 223 

Cz 


Zx on Zy, 223 

/ 3 xv = Txy 

Zy on Zx, 223 

0yx ~ r^x 

Regression equation 
X on y, 224 

y (Tx 

X — PzyCTx — f'xy y 

(Ty Cy 

X on Y, 221 

X = r^y-{Y - My) + My 

(Ty 

y on X, 223 

X (Ty 

y — r— f*yx X 
(Tx *Tx 

Yon.Y, 219 

Y =Tyy’’-^{X - My) + My 


Zx on Zy, 222 

Zx = PxyZy 

Zy on Zx, 222 

Zy = PyxZx 

Reliability coefficient 

effect of increasing variability of universe, 477 

“ <rx,*(l - rxx) 



560 GLOSSARY OF PRINCIPAL FORMULAS 

Speannan-Brown prophecy formula for reliabiUty of test as whole, 475 
2r*»' 

H-r„- 

Serial correlation, see Biserial r; Quadriserial r; Quintiserial r; Triserial r 
Skewness, 390 

OL ^10 “h Cm j 

Sk ==-2- 

standard error of, 391 

.5185(Cm - Cio) .5185D 

vn. vw. 

Standard deviation for distributions 
grouped data, 163 



grouped data, with guessed mean, 165 

” ‘\-N — 

ungrouped data, 162 



ungrouped data, with guessed mean equal to zero, 243 



with Sheppard’s correction for broad classes, 168 
<fc0rrecUd = i ^CuM? “ .0833 


standard error of, 380 

<r 0.707 

Ci, = 


O’ or .707 (FM 


V2{N.) y/W. 

standard error of a difference between standard deviations, 415 





GLOSSARY OF PRINCIPAL FORMUUS 


561 


Standard deviation for special situations 
average of two or more distributions, with deviations taken from respective 
means, 411 


fexi* H- Xx2^ + • 

■ • + 2*n* 

/ Nx + N2 + -- 

■+Nn 


of a series of ranks, 1 to n, 258 

(T = V(a:i2 -f- -f • • 4- Xn^)/N 


of n ranks, 258 



of the frequency of p events in a binomial distribution, 334 
or =VNapq 

of two or more combined distributions, with deviations taken from weighted 
mean of the combination, 417 

" \ Ni + Nt+---+Nn 

Standard error, see under various statistics: Average deviation; Biserial r; 

Gentile; Coefficient of Relative Variation; D range; Kurtosis, etc. 

Standard error of a difference between two statistics, 401 

for non-correlated samples, 402 

Standard error of a measure (test score), 378 

(TX = (TxVl — Txx* 


Standard error of estimate 
for a multiple correlation coefficient, 485 

^^c-l,2,3,4-..» ~ — Be. 1,2,3,4..-n* 

for mean of one variable predicted from mean of correlated variable, 460 




vT^ 


of X on y, 451 


O'eat. 




of y on X, 451 


= <^y^l — rxy^ 



562 


GLOSSARY OF PRINCIPAL FORMULAS 


Standard score, 185 

-S = 5.0 -f z, = 5.0 + 




Tercile deviation, 141 

• ~ 2 ” 2 

standard error of, 383 

.648<r . * ^ .9610 • . e m 

CT.D. = — 7 = or m terms of Q ctt.d. = — 7 = or m terms of T.D, 

Vn. vw, 

i.50ir.z). 

Test of Significance, 343 

^ _ (sample measure) — (parameter measure) __s — h 
standard error of the measure 

Test of Significance for a difference between two statistics, 403 

'f — sample difference — parameter difference of zero _ {sx — Sy) — 0 * 
standard error of difference 

Tetrachoric coefficient of correlation, 277 
be — ad Z 1 Z 2 

'■* ■ yiy^N^ 2 


standard error of, for null hypothesis, 389 


Triserial coefficient of correlation, 272 

yiMk + (yc - yk)Mc — 




(yc - y^y . yc’1 


Validity coefficient, effect of increasing variability of universe, 480 

„ _ ^ - »•».’) ’ 

" V—— 

Value of statistic needed for rejection of a hypothesis, 375 

p, = OpT + p* 


z score, 177 


X-M. 



GLOSSARY OF PRINCIPAL FORMUUS 


563 


z transformation function for r, 386 
2 = i[log. (1 + r) - log, (1 - f)] 


standard error of, 387 



standard error of a difference between z’s, non-correlated samples, 420 







Abilities, and aptitudes, organization of, 
489 ff. 

vs. aptitudes, 479 

Academic success predicted from two vari¬ 
ables, 482-483 
Accidental samples, 316 
Accuracy of prediction, 446, 451 ff. 
Achievement test variables, 498 ff. 

Actuarial analysis, 290 
Actuarial prediction, 8 
Addition theorem of probability, 336 
Adequacy, in sampling, 294, 313-314, 353, 
360,371 

of psychological tests, 314 
Aerial photography in sampling, 309 
Age, as a control factor in sampling, 300 ff. 
as an index of maturity, 485 
of college freshmen, 126 
Age differences in variability, 415-418 
Agricultural census, 309 
Alienation, coefficient of, 452 ff., 486 n., 490 
Allport, Gw. W., 9 n. 

Alternate-forms method of test reliability, 
471,473-474 

Ambiguous trichotomy, 28 
Amen, E. W., 23-24, 59-61, 431 
American Institute of Public Opinion, 287, 
315 

Amount-limit test, 466 
Analytical statistics, 10-11, 283 ff. 

Andrew, D. M., 466 n. 

Approximate measures, 16-17 
Aptitudes, definition of, 469, 479 
Areal sampling, 306-309 
Areal units in sampling, 298 


Areas of normal probability curve, 179, 264, 
508-511 

Arithmetic ability, 486 
Arithmetic mean, see Mean 
Army Alpha, 190-192, 380, 464, 473 n. 
Army Beta, 464 
Array, 101-103 

Association, Coefficient of, 91 ff. 
Asymmetrical distributions, 349 ff. 

Attitude scales, 27-28 

Attributes, classification and enumeration 
of, 19 ff. 

Average deviation, 150, 168-170 
P,E. of, 394 
standard error of, 380 

Average of standard deviations for two or 
more samples, 411, 417 
Averaging ranks, 256 

inter-correlation coefficients, 476 

Bar graphs, 59-64, 71 
Bar trend graphs, 61-64 
Barlow’s tables, 121 

Barometer, reliability and validity of, 465 ff. 
Barr, A. S., 506 
Bell, E. P., 4 n. 

Bell, H. M., 473 

Bell-shaped distribution, 129,174 ff., 431 ff. 
Belt graphs, 64-67 
Bennett, G. K., 187, 190, 191 
Bennett Mechanical Comprehension Test, 
186 ff., 400 

Bernoulli, Jacques, 3, 4 
Bernoulli, Nicolas, 3 

Bemreuter Personality Inventory, 190-192 
Bemreuter scores, 100 ff., 160 
565 





566 


INDEX 


Biased samples, 292-294, 314 
Bilateral symmetry, 152,175 
Binet I.Q., 378 
Binet intelligence test, 472 
Binet mental age, 54 
Bingham, W. V., 185 n., 516 
Binomial, and normal probability curve, 
331-332, 340 ff. 
for any power of n, 338 
when p 7 ^ g, 350-353 

Binomial expansion, whenn = 2, 332-335; 
when n = 3, 332-338; when n — 6 and 
12, 339-340 

Biserial coefficient of correlation, 259 if., 
338 

Biserial r, and test item analysis, 259, 481 
P.E, of. 395 

standard error of, 388-389 
Bi-variate data, 205 if. 

Body measurements, 493 fiF. 

cluster analysis of, 493-498 
Bowditch, A. P., 14 
Box method of tallying, 36, 112 
Brigham, G. C., 380 
British Institute of Public Opinion, 287 
Broadcasting Magazine, 147 
Brown, W., 474 n. 

Bryan, A. I., 503 n. 

Biicknell College, 498 
Buell, Bradley, 73 n. 

Bureau, of Agricultural Economics, 307 
of the Census, 307 
Buros, O. K., 506 

Cantril, Hadley, 287 n., 309 n., 315 
Castenada, Carlos, 5 
Categorical data, 13-14,19 if., 424 
comparison of, 43 if. 
correlation of, 80 fif. 
cross-tabulation of, 80 if. 
graphic methods for, 58 if. 

Categories, mutually exclusive, 21 
Cattell, J. M., 316 
Causal relations, 422, 489 
Causality and statistical relations, 330 
Census, 4-5,11 
data, 13, 285 

enumeration districts, 307-308 
vs. sample, 11, 285, 319 
Centile, definition, 127-128 
Centile graph, 131-133, 143,147 
Centile intervals, 127 if. 

Centile measure, of kurtosis, 392-393 
of skewness, 390-392 
Centile measures, 139 ff. 

summary of, 130 
Centile method, 127 if., 381 
Centile point values, ff. 


CentUe values, checking of, 136-138 
computation of, 134 ff. 

Centiles, 127 ff. 

determined by graphic method, 131 ff. 
P.E. of, 394-395 
standard error of, 381 ff. 

Tests of Significance for, 381-383 
Central tendency, 131, 139,152,154 
C.g.s. system, 470 

Chance errors, 286, 288, 293, 313, 332 
Chance expectancy for cross-tabulated at^ 
tributes, 438 ff. 

Chance factors, 356 
Chance hypothesis, 431 

See also Null hypothesis 
Chappell, M. N., 296 n. 

Character, of sample results, 353 
of samples vs. size of samples, 314-316 
Charts, bar type, 71 
circular, 67-68 
class interval limits, 109 
correlation, 81 ff., 207, 214, 231 
pictorial, 72-78 
purpose of, 58 

See also Graphs 

Chesire, L., 275 n., 408 n., 476 n., 506 
Chi-square, 373, 424 If. 
and Contingency Coefficient, 443 
and test item analysis, 482 
as Test of Significance, 430 
probability, 429, 515 

Test of Significance, for correlation be¬ 
tween dichotomized attributes, 437- 
441; for independence of two attributes, 
437 ff.; for trichotomy, 431; for variable 
distributions, 431-437 
vs. centile analysis of skewness and kurto¬ 
sis, 437 /- 

Circular charts, 67-68 
City College of New York opinion poll, 34 ff. 
Class interval, 103-111 
mathematical limits of, 105-108 
mid-value of, 108-109 
scale limits of, 109-110 
size of, 104-105 
Classes and subclasses, 22 ff. 

Classification, 19 ff. 

of judgments, attitudes, and opinions, 25 ff. 
rules of, 24-25 

Clerical efficiency predicted from two varia¬ 
bles, 484 

Clerical occupations, 478 
Clerical proficiency test, 261 ff. 

Clerical success, 469 
Cluster analysis, 464 
and factor analysis, 489 ff. 
of body measurements, 493-498 
of psychological variables, 498-503 



INDEX 567 


Coefficient, alienation, 452 ff., 486 n., 490 
Association, 91-92 
correlation, see Correlation 
determination, 489 

mean square contingency, 86,90,94,253n. 
non-determination, 490 
Relative Variation, 171-172,418-419 
reliability, 470 ff. 

Risk, 365 
validity, 480 

Cohen, M. R., 25 n., 34 n. 

Coleman, J. H., 319 n. 

Collective, 285 

Columbia Broadcasting System, 80,421 
Combinations in sampling, 332 ff. 

Common determiners, 493 
Common factors, 491, 498 
Communality of functions, 498 
Component factors, 489 ff. 

Confidence criteria, 364 ff., 372 
and likelihood, 360 ff. 
for significance of a difference, 403-404 
Confidence levels, 364 ff., 429 ff. 

0.1% level, 366, 429 
1% level, 403 
2% level, 365 
5% level, 365-366 
in terms of P.£J., 396-397 
in terms of T ratios, 366 ff. 

Confidence limits, 371 ff. 
centiles, 381-382 
chi-square, 429 ff. 

I.Q. scores, 378-379 
means, 377 
percentages, 374 
predictive estimates, 462 
reliability of a sample r, 387-388 
standard deviations, 379 
testing continuum of hypotheses, 368 ff. 
Conklin, E. G., 22 n. 

Conrad, H. S., 464 n. 

Constant errors, 293 
and size of samples, 297 
Consumer .expenditures in N. Y. State, 67 
Consumer's brand preference, 426-427 
Contingency Coefficient, 86, 90, 94-98 
from chi-square, 443 
maximum values, 97 
Continuous series, 15-16,127 
Continuum, 16 
of hypotheses, 368 ff. 

Control group, 407 

Controlled experimentation and sampling, 
319-321 

Controls in sampling, 299 ff., 320 
Converting rho to r, 388 
Copy testing, 321 
Cornfield, J., 306 n. 


Correlation, 6-8, 24, 80 ff., 195 ff., 253 ff., 
437 ff., 445 ff. 

and causal relationships, 422,489 
and chi-square, 437 ff. 
and heterogeneity in age, 486-487 
between; correlation coefficients, 421; 
dichotomized attributes, 90 ff., 276 ff., 
437-441; two proportions, 407 
by method, of differences, 248-249; of 
sums, 247-248, 475 
Contingency Coefficient, 94-98 
cross-tabulation essential to, 80 
distribution, 207 
in descriptive statistics, 80 ff. 
in sampling statistics, 80 
index of test reliability, 470 ff. 
methods for evaluation of psychological 
tests, 464 ff. 

non-variable attributes, 84-86, 91-92, 
437 ff. 

of categorical data, 80 ff. 
of dichotomized variables, 92-94, 258 ff., 
276 ff. 

of intelligence test scores and grades, 
457-459 

of “memory factors," 472 

of polytomous attributes, 86 ff. 

of ranks, 253-258 

phi coefficient, 92-94, 253 n. 

predictive meaning, 445 ff. 

product-moment method, 91,195ff.,445ff. 

product-moment r and 0, 93 

rank-difference method, 254 ff. 

rank-product method, 254 

rank-sum method, 254 

spurious, 205,487 

standard error of, see Standard error 
surfaces, 446 

Correlation chart, 81 ff., 206 ff., 214, 231 
as geometric field, 81-82, 199 
means and standard deviations from, 235 
Correlation coefficients, standard errors of, 
384 ff. 

Correlation profile, analysis, 492 ff. 
for achievement test variables, 500 
for body measurements variables, 495 
Correlation tally, 111-112, 207 
Correlational frequency, 203-206 
Co-variability, 195 ff. 

Co-variation, 8 
Critical ratio, 366 
Critical scores, 469 
Crossley, Inc., 147 

Cross-section vs. representative samples, 312 
Cross-tabulation, 37 
essential to correlation, 80 ff. 
of bi-variate data, 197 ff. 
of categorical data, 80 



568 


INDEX 


Cumulalive frequency distribution, 121-122 
Cureton, E. E., 261 n., 272,492 n. 

Curve of error, 4, 285, 326 

See also Normal probability curve 

D range, 130, 140 
P.E. of, 395 
standard error of, 383 
Data, 13-15 

of categories, 19 if.; of variables, 99 if. 
Deciles, 128-130 
P.E, of, 395 

standard error of Eh and Eh, 383 
Degrees of freedom, 373, 399 n., 428 if., 438 
for test of independence, 441 
Deming, W. E., 293, 294 n., 296 n., 307 n. 
de Moivre, 4 

Dependent samples, 319, 402 
Descriptive statistics, 4 if. 

summary of methods, 283-284 
Determination, coefficient of, 489-490 
Deviation, 176 
mean, see Average deviation 
measures of, 140-141, 161 if. 

Dewey, T. E., 405-406 
Dichotomization, of bi-variates, 276 if. 

of height-weight measures, 276 
Dichotomized attributes, 437-441 
Dichotomized variables, emd serial correla¬ 
tion, 258 if. 
correlation of, 92-94 
Dichotomous classification, 19-20 
and division, 20 if. 

Dichotomy, 90 

on nornial, bell-type distribution, 265 
Differences between statistics, Tests of 
Significance for, 401 if. 

Differentiation, quantitative, 33 
Digit-span test, 227-228, 248-249, 472 
Diminishing returns in sampling, 300, 305, 
320 

Discontinuous series, 15-16 
Discrete data, 16 

Discrete sampling distributions, 333 if. 
Dispersion, 100, 140 
Distance, measurement of, 470 
Distribution, bell-shaped, 129, 174 ff., 431 if. 
J-type, 129, 175 n. 

normal probability, see Normal probabil¬ 
ity curve 

of cffi-square, 427 ff., 515 
of frequencies, 112 ff. 
of sample results, 323-324 (see also Sam¬ 
pling distribution) 
of/, 398, 514 
rectangular, 129 
skewed, 349^353 
I/-type, 175 n. 


Division, 19 ff. 

by exact criteria, 21 
D.K,% 29 if., 87-88, 405 
Type I, 30 
Type II, 30 

Doubtful inferences, 365 
Dreyfuss, M., 34 n. 

Du Bois, P. H., 254 n., 256 n. 

Duncan, A. J., 350 n., 506 
Dunlap, J. W., 260, 506 
Dunlap’s formula for biserial correlation, 
260 

Eaton, R. M., 25 n. 

Edgerton, H. G., 506 

Efficiency, of predicted estimates better 
than a guess, 455 
of prediction, 446, 451 ff. 

Empirical appraisal of a test, 478 
Empiricism and research, 360 ff. 
Enumeration, 19 if. 
of attributes, 19 
vs. measurement, 33-34 
Equated groups in sampling, 318-321 
Equiprobability, 331-332 
Equivalence of psychological tests, 473 
Error, and precision in sampling, 355 
of estimate, 451 ff. 
of measurement, 285, 326 
sources of, in sampling and measurement, 
294 

See also Probable error; Standard 
error 

Errors, constant vs. chance, 293 
in stratified sampling, 311-312 
of observation, 285, 356, 465 
of prediction, 314 
of sampling, 285-288 

See also Chance errors 
Evaluation of psychological tests, 464 ff. 
Exact measures, 16-17 
Experimental group, 407 
Experimental method, and sampling, 321- 
322 

of equated groups in sampling,' 318-321 
of matched groups, 407 
Experimental science, 360-361 
Extra-chance factors, 384 
Ezekiel, Mordecai, 506 

Factor analysis, 464, 489 ff. 

Fermat, 3,4 
Fiducial limits, 371 
Findex system, 39-41 
Finite populations, 289 
Fisher, R. A., 12, 286 n., 295 n., 348. 371, 
386, 397 n., 401 n., 410 n., 424 n., 425, 
427 n., 430 n., 506,514 n., 515 n., 518 n. 



INDEX 


569 


Fisher's null hypothesis, 410-412 
Fisher's t statistic, 348-349, 355, 397-399, 
514 

Fisher's z function, 384, 386-387, 400, 420, 
476 

and Pearson's r, values of, 386, 518 
P.E. of, 395 

standard error of, 387, 420 
Flanagan, J. C., 481 n., 499 n. 

Form of sampling distributions, 331 if. 
Formulas, glossary of, 551-563 
Fortune polls, 404-405 
Fortune Survey, 75,78 
Fourfold table, 80 ff., 276 if., 438-440 
Frequencies, and chi-square, 424 ff. 

at mean of normal distrilmtion, 432 
Frequency, P,E, of, 394 
score value of, 135 
standard error of, 353, 375 
Tests of Significance for, 375-376 
Frequency distribution, 103 ff. 

cumulative, 121 ff. 

Frequency polygon, 117-118 
vs. histogram, advantages, 118-120 
Frequency theory of probability, 328 
Functional validity, 465,478 
Functions of r, values of, 516-517 
i^undamentum divisionisy 25 

Gallup, George, 78, 287, 288, 292, 310, 318, 
373 

Gallup poll, 78, 286 ff., 315, 317, 373-374, 
400 

Gallon, Sir Francis, 6, 8, 14, 91, 196, 206, 
209 n. 

Garrett, H. E., 503 n. 

Gauss, 4 

General science factors, 501 

Geometric field and correlation, 81, 82, 199 

Girschick, M. A., 493 n. 

Glossary, of formulas, 551-563 
of symbols, 547-549 
Godfrey, E. H., 4 
Goldman, »E. F., 506 
Gosset, W. S., 348 
Grade scores, 126, 447 ff. 

Graphic methods for categorical data, 48- 
78 

Graphs, bar, 59-64, 71 
belt, 64-67 

binomial distributions, 334,345,350 
centile, 132, 143, 147 
chi-square distributions, 427 
correlation profile, 495, 500 
cumulative frequency, 123,124 
error of estimate, 455-457 
frequency polygon, 117,118,120,121 
histogram, 115, 116, 119 


individual psychograph, 190-191 
J-type, 129 
line, 117-120 

normal probability curve, 7,129,174,180, 
265, 345, 362, 369, 396, 406 
normal curve fitted to sample distribution, 
433 

percentage cumulative distribution, 123 
percentage frequency distribution, 121 
pictorial, 78 

predictive estimates, 449, 450 
profile, 190-:191 
psychograph, 190-191 
rectanguleu* distribution, 129 
relation of P.E. to cr, 396 
sampling distributions, 334, 345, 350, 362, 
369, 406 

scatter of correlation frequencies, 198, 
201-203, 445 

variability in sample results, 358 
See also Charts 
Group factors, 491, 498 
Guessed mean, 157 
Guilford, J. P., 481 n. 

Gulliksen, H. O., 272 n., 506 

Halley, 5 

Hallonquist, Tore, 80n., 119 n., 407 n., 421 n. 
Hand-sorting, 37-38 
Hansen, M. H., 296 n., 307 n. 

Height measurements of infants, 204 
Heterogeneity, in age, 486, 493 
in sampling, 293, 315 
of matched samples, 414 
Hidden factors in correlation, 487 
Higgons, R. A., 120 n., 198 n. 

Histogram, 11^117 

advantages over frequency polygon, 118- 
120 

Homo sapiens, 23 
Homogeneity in sampling, 292-293 
Homoscedasticity, 452 n. 

Hooper, C. E., 296 
Hoover, Herl)ert, 306 
Hotelling, H., 491, 492 
House-to-house interviewing, 297 
Hull, Clark, 236 n., 459 
Human traits, organization of, 489 
Huygens, Christian, 3 
Hypotheses, 325, 360 ff., 424 
about distributions of frequencies, 424 ff. 
and Tests of Significance, 360 ff. ^ 
of “no difference,*' 401 
of zero difference, 410 

See also Null hypothesis 
Hypothetical frequencies, for normal dis¬ 
tributions, 432 ff. 
for test of independence, 439 ff. 



570 


INDEX 


I.B.M. card, 38, 236 
Identification in measurement, 33 
Ignorant samples, 316 
Independence values, 95-96, 439 ff. 
Independent samples, 319, 402 
Index numbers, 52-55 
of predictive efficiency, 459-460 
of reliability, 465 
Individual differences, 34 ff. 

Inertia of large numbers, principle of, 297 
Infants’ height-weight measurements, 204 
Infants’ sitting ages, 120 ff. 

Infinite populations, 289 
Initial sampling units, 298 
Insignificant differences, 403 
Institute of Public Administration, 65, 66, 
69-71 

Intelligence, 470, 479 
and G, 491 

Intelligence Quotient, 54 
Intelligence test scores, 100, 126, 190-192, 
378-379,447 ff. 

Inter-correlation, coefficients, 238-239 
of factors in sampling, 304 
Inter-quartile range, 128-130 
Inter-tercile range, 128-130,141 
Interest, 469 

Interest ratings. Strong, 100 
Internal controls in sampling, 306, 311 
Interviewing, 297 

Intra-group differences in sampling, 317-318 
Invariant relationship, 195, 210 
I.Q., 376-379, 492 
I.Q. index, 54 
Item inter-correlation, 471 
and test reliability, 476 
Item reliability and validity, 481 

J-type distribution, 129,175 n. 

Jaspen, Nathan, 272, 273, 275 
Jessen, R. J., 309 n. 

Judgments, attitudes, and opinions, 25 ff. 

Kelley, T. L., 107, 390,452,491, 506 
Kellogg, L. S., 506 

Kendall, M. G., 15 n., 91 n., 295 n., 424n., 
506 

Kenney, J. F., 506 
King, A. J., 309 n. 

Klineberg, O., 409-411 
Koren, John, 4 n. 

Kuder, G. F., 473 n. 

Kurtosis, 348, 372 n., 390, 392-393, 431 ff. 

standard error of, 392 
Kurtz, A. K., 506 

Landon, 292 
Laplace, 4 


Large sample theory, 343, 355-356, 367 
Tests of Significance for, 360 ff. 
vs. small sample theory, 324 
Large samples vs. small samples, 349 
Larrabee, H. A., 360 n. 

Laws of chance, 331 
Lazarsfeld, Paul, 29 n., 31, 407 n. 

Learning curves, 58 

Least squares, method of, 225 

Length factors, 498 

Leptokurtic sampling distributions, 347- 
348, 353, 355 
Leptokurtosis, 348, 393 
Lerrigo, Ruth, 73 n. 

Likelihood, 329-330 
and confidence criteria, 360 ff. 

Likely hypotheses, 329, 360 ff., 397,454 
Likely results in sampling, 340,358-359,366 
Likert, Rensis, 309 
Limits, class intervals, 103 ff. 
tenable hypotheses, 368-369 
untenable hypotheses, 368-369 
Line graph, 116-120 

Linear correlation, se^ Product-moment r 
Linear regression lines for bi-variate dis¬ 
tributions, 208 ff. 

Link, H. C., 444 n. 

Linnaeus, 23 

Literary Digest poll, 292, 306 
Literature factors, 501 
Locke, N. M., 227 n., 472 n. 

Logical division, rules, 24-25 
Longstaff, H. P., 466 n. 

Lottery methods in sampling, 295 

McNemar, Quinn, 316 n., 414 n. 

Machine method for product-moment r, 236 

Machine tabulation, 38-39 

Mail-ballot poll, 306 

Map charts, 69,70 

Maps, 68-72 

Market research, 17, 29 ff., 86 ff., 296, 
306 ff., 361 ff. 

Master sample, technique of, 309-310 
Matched groups, 412 
experimental method of, 407 
Matched samples, 319-321, 407 
Matching pairs, 407 
Mathematical factors, 501 
Mathematical limits of class intervals, 105 ff. 
Mean, 7, 54, 150 ff. 
as fulcrum, 175 

as measure of central tendency, 152, 154 
as typical measure, 152 
as point of reference, 175 
correction for, 158-159 
for data grouped into class intervals, 
154 ff. 



INDEX 


571 


for ungrouped data, 15^154 
from guessed mean, 155-160 
of series of ranks, 258 
probable error of, 394 
reliability of, 377 
standard error of, 376 
Test of Significance for, 376-377 
Mean deviation, 168 

See also Average deviation 
Mean differences, 409 ff. 

Mean frequency in binomial distribution, 
333 

Means, and standard deviations from cor¬ 
relation chart, 235 
of independent samples, 409 ff. 

Measure of scatter, 452 ff. 

P.E, of. 394 
standard error of, 378 
Measurement, enumeration vs., 33-34 
errors of, 285, 326 

Mechanical Comprehension test, 186-188, 
400 

Median, 128,131,135, 139 
P,E, of, 394 
standard error of, 382 
Median inter-correlation coefficient, 476 
Mental age scores, 380 
Merrill, M. A., 378 n., 379, 414 n. 
Mesokurtosis, 348, 353, 392-393 
Mid-case, 139 n. 

Mid-values of class intervals, 108-109 
Minnesota Vocational Test for Clerical 
Workers, 466 ff., 478 
Modal frequencies, 351-352 
Mode, 175 

of binomial distribution, 351 
Moments, 150 n. 

Multiple choice method, 26 
Multiple correlation, 481, 482-485, 489 
Multiple-factor theories, 491 
Multiple regression equation, 484 

Nagel, Ernest, 25 n., 33 n., 34 n. 
Name-checking, 466 
Necessary inference, 361 
Negative correlation, 201 
and prediction, 450-451 
Negative numbers in psychological scales, 
114 

New York Ekiily News poll, 315 
New York Herald Tribune, 404 n. 

New York State Teachers Association, 64,67 
New York Times, 373 n. 

New York Times Magazine, 76 
New York World-Telegram, 78 
Non-chance factors, 489 
Non-correlated samples, 404 
Non-determination, coefficient of, 490 


Non-linear correlation, 203 
Non-variable attributes, 19 ff., 283, 425 
comparison of, 43 ff. 
correlation of, 84-86, 91-92, 437 ff. 
Normal bell-shaped distribution, see Normal 
probability curve 
Normal correlation surfaces, 446 
Normal curve, asymptotic character of, 176 
formula for, 184 
implications of, 174 ff. 
of error, 285, 326 

Normal distribution, measures of varia¬ 
bility for, 184 

Normal probability, and binomial distribu¬ 
tions, 340-341 

and skewed sampling distributions, 349 ff. 
Normal probability curve, 4, 6, 7, 14, 129, 
174 ff., 183-184. 264 ff., 285, 331 ff., 

431 ff., 508-511 

fitted to sample distribution, 432-433 
See also Graphs, normal probcdiility 
curve 

Normal probability distribution, 347-348, 
396, 508-511 

Normal sampling distributions, 333 ff. 
Norms for Bennett Mechanical Compre¬ 
hension test, 186 ff. 

Northrop, M. S., 308 n. 

Null hypothesis, 384-385, 388, 401, 403, 
410-412, 425, 431, 437, 442-443 
Number-checking, 466 

O’Brien, R., 493 n. 

Observation, errors of, 285, 356, 465 
Obtained measures, 322 
Occupational categories, 64 
Odds-even method of reliability,471,474-476 
Office of Public Opinion Research, 287 n., 
315 

Ogive, 123-124 
Operational unities, 492 
Operational validity, 465, 478 
Opinion poll, 34 ff. 

Opinion questionnaire, 421 
Ordered series, 14 

Ordinates of normal probability curve, 264, 

432 ff., 508-511 

Organization, and interrelation of psycho¬ 
logical functions, 489 ff. 
of human traits, 489 
Original score value from a z-soore, 215 

Paired associates in correlation, 205-206 
Parameter, 10, 322, 342 ff. 

Parameter differences, 401 
of zero, 403 ff. 

Partial correlation, 481,485-487,493 
Partial investigations in sampling, 316-317 



572 


INDEX 


Partial samples, 320 
Pascal, 3,4 

Paterson, D. G., 466 n. 

Payne, S. L., 308 n. 

Pearson, E. S., 506 

Pearson, Karl, 86, 91, 93, 94, 171, 197,225, 
279, 284, 424, 428 n., 440, 506 
Pearson Coefficient of Relative Variation, 
171-172, 418-419 

Pearson product-moment r, 91,195 ff., 44511. 
Pearson r, by method of differences, 248-249 
by method of sums, 247-248 
Pearson short-cut computation of X*, 440- 
441 

Peatman, J. G., 80 n., 105 n., 119 n., 120 n., 
177 n., 198 n., 227 n., 321 n., 407 n., 
421 n., 445 n., 464 n., 472 n., 543 n. 

Per capita cost of education, 52-53 
Per capita income, 63-64 
Per capita indices, 52-53 
Percentage value of a frequency, 121 
Percentages, 43-46, 49-52 
confusion in use of, 55-58 
cumulative frequency distribution, 122- 
126 

differences, 404 ff. 
errors in averaging, 57-58 
frequencies, 120 

frequency distribution, 120-121 
P.E. of, 394 

sampling distributions, 362 
standard error of, 373 
Tests of Significance for, 373-374 
Percentiles, 127 

See also Gentiles 
Perl, R. E., 503 n. 

Permutations, 332, 337 
Personality differences, 23 
Peters, C. C., 225 n., 247 n., 341 n., 356 n., 
383 n., 401 n., 482 n., 506 
Phi coefficient of correlation, 92-94, 253 
and test item analysis, 482 
P.E. of, 395 
standard error of, 389 
Philip, M., 225 n., 339 n. 

Philip II, King of Spain, 5 
Physical dimensions, 496 
Pictograph Corporation, 76 
Pictorial charts, 72-78 
Pie diagrams, 67-68 
Platykurtosis, 348 
Point binomial, 331 ff. 

Point-biserial correlation, 270-272 
Points of inflection and <r, 175-176 
Polytomous attributes, correlation of, 86 ff. 
Polytomous classification, 19-21 
Population, 285 

Populations, finite and infinite, 289 
Potts-Elennett Tests, 190-192 


Practical English usage factors, 501 
Precision, 353 ff. 
and reliability, 355-359 
and size of samples, 313, 353 ff. 
function of VTV^, 354-355 
in sampling, 313-314 
measured by standard error, 353 
Predictions as average estimates, 447 ff. 
Predictive efficiency, in correlation, 451 ff. 
index of, 459-460 
of battery of tests, 462, 482 ff. 
of combined tests, 482-483 
Predictive meaning of correlation, 445 ff. 
Primary control factor in all sampling, 311 
Primary mental factors, 492 
Princeton University Office of Public 
Opinion Research, 287 n., 315 
Probability, 3, 4, 328 ff. 

and Tests of Significance, 360 ff. 
definition of, 328 
implications of, P.E., 396 
of chi-square, 427 ff. 
of result, 372 

product £uid addition theorems of, 336-337 
theory of, 328 ff. 

Probability curve, see Normal probability 
curve 

Probability distributions, 179, 334-335, 347, 
396 

Probability estimates, and likelihood, 329- 
330, 360 ff. 

for normal distributions, 341 ff. 
Probability ratio, 329 
Probability values, chi-square, 429, 515 
normcd, bell-shaped distribution, 179, 
508-511 

proportions, 356 

t of small samples, 398-399, 514 
T, 512-513 

Pro^ble error, 183, 326, 356 
and standard error, 396 
and Tests of Significance, 393-397 
of: a centile, 394; an arithmetic mean, 
394; an average deviation, 394; biserial 
r, 395; D range, 395; estimate, 451 n.; 
Fisher’s z function, 395; frequency, 394; 
mean, 394; measure, 3^; median, 394; 
percentage, 394; product-moment r, 
395; proportion, 394; quartile devia¬ 
tion, 394; rho, 395; statistic, 326, 393; 
tercile deviation, 394-395 
Product deviations, 233-234 
Product-moment correlation, 195 ff., 445 
and phi, 92-94 
and rho, 234 ff. 
and serial correlation, 258 ff. 
by method, of differences, 248-249; of 
sums, 247-248 
computation of, 225 ff. 



INDEX 


573 


confidence limits for reliability, 387 
estimation of, 208 if. 
from grouped data, 229-235 
from ungrouped data, 226-229 
functions of, 516-517 
machine method for, 236 if. 

P.E, of, 395 

practical meaning of, 445 ff. 
predictive implications of, 445 ff. 
sampling distributions of, 324 
special methods for, 253 ff. 
standard error of, 384 
Tests of Significance for, 384 ff. 

Product theorem of probability, 336 
Proficiency, 479 
Profile analysis, 492 ff. 

Profile chart, 188-193,190-191 
Program Analyzer, 80, 119 n., 407 n., 421 
Prophecy formula, 474r-475 
Proportion, P,E, of, 394 
standard error of, 375 
Proportions, 43 ff. 

Tests of Significance for, 375 
values for p and q, 267, 519 
Psychograph, 188-193 
Psychological Corporation, 87, 187, 190-191 
Psychological fimctions, organization and 
interrelation, 489 ff. 

Psychological tests, 464 ff. 
and reliability, 314, 466-468, 470 ff. 
and validity, 314, 468 ff., 478 ff. 
Psychological variables, cluster analysis, 
498-503 

Public Affairs Committee, Inc., 73, 75, 77 
Public opinion research, 286-288, 290, 292, 
298, 300, 306 ff., 315 
Punch card, 12, 38 ff. 


Quadratic equation, 276-278 
Quadriserial r, 272-273 
Quantitative differences, 13-11 
Quartile deviation, 140-141, 382 
P.E. of, 394 
standard error of, 383 
Quartiles, 128-130 
P.E. of, 394 
standard error of, 382 
Quetelet, 3, 4, 5-6, 14, 17, 196 
Quintiles, 128-130 
Quintiserial r, 272, 275 
Quota method of sampling, 310-311 

r, functions of, 516-517 
values of, for k, 453, 516-517; for Fisher’s 
z function, 386, 518 
See also Correlation; Product-mo¬ 
ment correlation 
71, multiple correlation, 482-485 


Radio research, 119, 296, 407, 421 
Radio Station WOR, 147 
Random numbers, 295-296, 543-545 
Random samples, 294 ff., 310 ff. 

Random sampling, see Sampling 
Randomization, primary control factor in 
sampling, 299 ff., 311 
principle of, 294-299 
Range, 99-101 

Rank-difference method of correlation, 
254^258 

Rank-product method of correlation, 254 
Rcoik-sum method of correlation, 254 
Rcuiking test scores, 256 
Ratios, 43-55 
Raw data, 33 
Reaction-time, 466, 470 
Reciprocal of N, 121 
Reciprocals, table of, 522-541 
Rectangular distribution, 129 
Reduction, of data, 11-13, 19 
of sampling error by stratification, 301 ff. 
Refined data, 33 
Regression, 209 n., 450-451 
Regression coefficients, 222-223 
Regression equations, x on y, 224, 447 
X on y, 221, 447 
y on z, 223, 447 
y on X, 219, 447 
z* on Zyy 220-222 
Zy on z*, 218-219 
Regression line, 208 ff., 446 
for ~y on z*, 217-218 

Relations between measures of variability, 
184 

Relationships, 8-9, see Correlation 
Relative precision in sampling, 355-359 
Relative vari€d)ility, 171, 418 
Reliability, 160 n., 465 ff. 
and precision in sampling, 355-359, 369 
effect of restricted range of ability, 
477 

of a mean, 376 
of a sample r, 387 
of a standard deviation, 379-380 
of a statistic, 371 

of a test, 314, 464 ff.; by alternate-forms 
method, 473-474; by item-correlation, 
476; by retest method, 471-473; by 
split-half method, 474-476 
of intelligence test scores, 378-379 
of test items, 481 
of test scores, 378-379, 464 ff. 

Reliability coefficient, 467 
Representative samples vs. typical cross- 
section, 312 

Representativeness in sampling, 312 
and precision, 313-314 
Restricted universes in sampling, 316-317 



INDEX 


574 

Rho correlation coefficient, 254-258 
P.E. of, 395 

relation of, to r, 257-258 
standard error of, 388 
Richardson, M. W., 271 n. 

Risk, Coefficient of, 365 
Roosevelt, F. D., 292, 306, ^5-406 
Roper, Elmo, 75, 78, 288, 404-405 
Rounding off numbers, 47-49 

Saffir, M., 275, 408 n., 476 n., 506 
Sample, 11,17, 34 
Sample frequencies, 425 ff. 

Sample types, 290 ff. 

Samples, 283 if. 
biased, 292-294, 314 
character of, 353; vs. size, 314-316 
dependent, 319, 402 
ignorant, 316 
independent, 319, 402 
matched, 319-321, 407 
random, 294 if., 310 if. 
representative, 291-292, 312 
simple, 311 

Sampling, accidental, 316 
adequacy, 313-314 
and experimental method, 321-322 
and test norms, 307 
as research technique, 286 if. 
bias in, 292-294, 314 
controlled, 311 
in U. S. Census of 1940, 296 
internal controls, 306 
inter-relation of stratifying factors, 302- 
303 

intra-group differences, 317-318 
methods of, 290 if. 

normal probeUsility curve, 324, 331, 341 ff. 

partial investigations, 316-317 

precision, 313-314, 353 ff. 

primary control factor, 299 

randomization, 294 ff., 310 ff. 

repeated, in market research, 370 

restricted universe, 316 

skewed distributions, 350-353 

small sample vs. large sample theory, 324 

stratified-quota method, 310-311 

stratified-random, 299 ff. 

techniques, 283 ff. 

theory and cluster analysis, 491 

unit, 297-299; initial vs. basic, 298-299 

variations in prediction, 454 

vs. census, 285, 319 

Sampling distriWion, 323-324, 331 ff., 
350 ff., 360 flf., 372 
leptokurtic, 353 

of adifference between twopercentages, 406 
of chi-square, 427 ff. 


Sampling errors, 285-288, 311-312 
reduced by stratification, 301 ff. 

Sampling statistics, 9 ff., 283 ff. 

and experimental science, 360-361 
Scale, 15 

of test difficulty, 188 
Scatter, measure of, 451 ff. 

of correlation frequencies, 445 
Scattergram, 197 ff. 

Schafer, Roy, 543 n. 

Schedule, 11 
of information, 34-35 
Scholastic aptitude, 482 ff. 

School expenditures, 251-252 
School Life^ 251 n. 

Schools as sampling units, 298-299 
Scientific method, 24-25, 285-286 
Score, standard error of, 378-379, 467 
Score value of frequency, 135 
Scores, Standard, 54,18^186 
z, 177-178 

See also Test scores 

Secret-ballot technique in sampling, 309 
Segmented variables, 258 ff. 

Selective Service System, 295 
Self-correlation, 495 

See also Reliability of a test 
Serial correlation, 258 ff. 

Series, 15-16 

Sex as control factor in sampling, 300 ff. 

Sex differences in veu'iability, 414 
Sex ratio, 14, 55 
Shen, Eugene, 414 n. 

Sheppard, W. F., 167, 200 n. 

Sheppard’s correction for o-, 167-168, 208 
Short-cut methods, correlation, 229ff., 253ff. 
mean, 155 ff. 

standard deviation, 163 ff. 

Sigma, 162 

See also Standard deviation 
Significance, and null hypothesis, 385 
of a difference, confidence criteria for, 
403-404 

of sample results, 363 ff. 

Simple enumeration, method of, 16 
Simple samples, 311 
Simple sampling, 294 
Sitting ages of infants, 120 ff. 

Size of samples, precision (reliability) of, 
353 ff. 

Skewed sampling distributions, 349 ff. 

and Standard scores, 188 
Skewness, 139, 159, 372, 390-392, 431- 
432 

and kurtosis, chi-square vs. centile analy¬ 
sis, 437 

in sampling distributions, 324 
standard error of, 391 



INDEX 


575 


Small sample theory, 347 if., 355-356 
Ys. large sample theory, 324 
Small samples, Tests of Significance for, 
397-399 
vs. large, 349 
Smith, B. B., 295 n. 

Smith, J. G., 350 n., 506 
Smith, y. G., 53 n., 142 n. 

Social intelligence, 482 
Social statistics, 3 

Spearman, Charles, 253, 279, 474 n., 490, 
491, 498 

Spearman-Brown prophecy formula, 474- 
476 

Spearman’s general factor, G, 490-491 
Spearman’s rank-difference method of cor¬ 
relation, 254-258 

Spearman’s two-factor theory, 490-491 
Specific factors, 490, 498, 501 
Sperry Gyroscope Co., 319 n. 

Split-half method of test reliability, 471, 
474r-476 

Split-half reliability by differences, 475-476 
Split-run copy testing, 321 
Spurious correlation, 205, 487 
Square roots, table of, 522-541 
Squares, table of, 522-541 
Stalnaker, J. M., 271 n. 

Standard deviation, 54,150,160 ff., 243-244 
for imgrouped data, 161-162 
from correlation chart, 235 
of a frequency in a binomial distribution, 
334 

of a series of ranks, 258 
of sampling distributions, 325, see also 
Standard error 

of two or more combined groups, 417 
P.E. of, 394 
reliability of, 379-380 
Sheppard’s correction for, 167-168 
short method of computation, 163-167 
standard error of, 380 
Test of Significance for, 379-380 
Standard error, 342 
an averse deviation, 380-381 
biserial r, 388-389 
centile, 381-383 
Coefficient of Association, 389 
Coefficient of Relative Variation, 418 
correlation coefficient, 384, 388-389 
D range, 383 
Di and Dt, 383 

difference between: any two statistics, 
401 ff.; coefficients of relative varia¬ 
tion, 418; correlation coefficients, 419ff.; 
means, 409 ff.; percentages, 404 ff.; pro¬ 
portions, 404 ff.; standard deviations, 
414 ff.; z functions, 420-421 


estimate, 446ff., 451 ff.; for Fisher’s z 
function, 387; for /?, 484; of the mean, 
460; of X on y, 451; of ^ on x, 451 
frequency, 353, 375 
kurtosis, 392 
mean, 376 

measure, 378-379,467 
median, 382 

percentage, 357-358, 373 
product-moment correlation coefficient, 
384 

proportion, 354, 356, 375 
quartile, 382 
quartile deviation, 383 
rank-difference correlation, 388 
skewness, 391 
standard deviation, 380 
statistic, 325, 353 
tercile, 381 
tercile deviation, 383 
test score, 378-379, 467 
tetrachoric correlation coefficient, 389 
Standard measures, centile implications of, 
178-182 

Standard score, 54,185-186 
and skewed distributions, 188 n. 
norms, 186-188 
profile chart, 188 ff. 

Stanford Binet, 54, 376-379, 473 n. 

Stanton, Frank, 407 n. 

Statistic, 10, 322-323 
value needed for rejection of hypothesis, 
375 

Statistical data, 13 ff. 

Statistical frequencies, 36, 424 
Statistical hypotheses, 325, 360 ff., 371-372, 
424 

and parameter values, 368 
Statistical inference, 9, 328 ff. 

Statistical population, 288 
Statistical probability, 328 ff. 

Statistical terminology, 9 ff., 285 ff., 
322 ff. 

Statistical Tests of Significance, 360 ff., 
401 ff., 424 ff. 

Statistical universe, 285, 288-290, 303, 316- 
317, 322-323, 371 

Statistical variable, definition, 15-16 
Statistics, 322 

actuarial nature, 8-9 
definitions, 9-11 
Stature, 497 
Steinzor, Bernard, 153 
Stephan, F. F., 296 
Stereotypes, 24 
Stewart, M. S., 75 n. 

Stock, J. S., 309 n. 

Straight-line function, 209 



576 


INDEX 


Strata controls in sampling, 299 ff. 
Stratification, 34 ff. 
classes and subclasses, 22 ff. 
reduction of sampling errors by, 301 ff. 
secondary control factor in sampling, 
299 ff., 311 
Stratified matrix, 36 
Stratified samples, 299 ff. 

Stratified sampling, error in, 311-312 
and representativeness, 312 
Stratified>random sampling, 299 ff., 353 
Stratifying factors, in sampling, 300 
inter-relation of, 302-305 
Strong, E. K., 26 
Strong Interest Ratings, 100 
Strong’s Interest Inventory, 26 
Student^ 348, 356 n. 

Sub-samples, 298 ff., 407 
Sub-universes in sampling, 305-306 
Symbols, glossary of, 547-549 

T, test ratio, 363 ff., 372 ff. 

T ratio, 342 ff., 397 
as confidence criteria, 366 ff. 
in terms of P.E., 396-397 
t statistic, 348-349, 355, 397-399 
t values for small sample theory, 397, 514 
T values for large sample theory, 512-513 
Tabulation, 19 ff., Ill ff. 

Tally, 111-112 

box method, 36, 112 
correlation. 111 n., 207 
Tally sheet, 36 
Taylor, E. K., 267 n., 519 n. 

Teachers’ salaries, comparison of, 142 ff. 
Tentative hypotheses, 397 
Tentative inferences, 365 
Tercile deviation, 141 
P.E. of, 394-395 
standard error of, 383 
Terciles, 128-130 
standard error of, 381 
Terman, L. M., 378 n., 379 
Terminology for sampling statistics, 322 ff. 
Test batteries and predictive efficiency, 462, 
482 ff. 

Test battery validity, 480 
Test equiveJence, 473 
Test evaluation, 464 ff. 

Test item analysis, 268,481-482 
and biserial r, 259, 481 
chi-square, 482 
phi correlation, 482 
tetrachoric r, 482 
Test items, 259, 481-482 
reliability of, 481 
Test norms, 186 ff. 
and sampling, 307 


Test of clerical proficiency, 261 ff. 

Test ratio (T), 342 ff., 363 ff., 372 ff., 397- 
398 

Test reliability, 465 ff. 
and standard error of test score, 467 
by method: of alternate forms, 473-474; 
of item-intercorrelation, 476; of split 
halves, 474-475; of test-retest, 471-473 
determination of, 470 ff. 
effect of range of ability on, 476 
Test-retest measure of reliability, 471-473 
Test scores, 17, 100, 186-193, 261 ff., 466 
reliability of, 378-379 
standard error of, 360 ff., 378, 467 
Test-tube sample, 294, 300 ff. 

Tests of Significance, 288, 348-349, 360 ff., 
372 ff., 401 ff. 
and P.E., 393-397 
for continuum of hypotheses, 368 ff. 
for correlation coefficients other than r, 
388-390 

for difference between: any two statistics, 
401-404; arithmetic means, 409; co¬ 
efficients of relative variation, 418- 
419; percentages (proportions) 404-409; 
product-moment coefficients of corre¬ 
lation, 419-422; standard deviations, 
414-418 

for form of variate distribution, 431 ff. 

for frequencies, 375-376 

for kurtosis, 392-393 

for large sample theory, 397-399 

for mean, 376-377 

for percentages, 373-374 

for predictive estimates, 461-462 

for product moment r, 384 ff. 

for proportions, 375 

for skewness, 390-392 

for small sample theory, 397-399 

for standard deviations, 379-380 

for test scores, 378-379, 467—468 

for trichotomy, 431 

for variable distributions, 431-437 

logic of, 371 ff. 

Tetrachoric correlation, 275-279,* 389, 408, 
476 

and test item analysis, 482 
Thomson, G. H., 491, 506 
Thorndike, E. L., 206, 491 
Thorndike Intelligence Test Scores, 457 
Thurstone, L. L., 275, 408 n., 476 n., 491, 
492, 506 

Thurstone’s Computing Diagrams^ 276-279, 
408,476 

Time, measurement of, 470 
Time series, 58, 64, 119 
Tippett, L. H. C., 295 n. 

Torricelli barometer, 465 



INDEX 


577 


Traits, organization of, 489 ff. 

Trichotomy, 22, 26, 86 ff., 272 ff., 431 
and correlation, 86 ff., 272 ff. 

Triserial r, 272-273 
True measures, 322 
Tryon, R. C., 491 n., 492 ff. 

Tryon’s method of correlation profile 
analysis, 492 ff. 

Turnbull, W., 309 n. 

Two-factor theory, 490 

(/-shaped distribution, 175 
Uni-modal distributions, 152, 175, 431-432 
Unit of sampling, 297-299 
Universe, 285, 288-290, 303, 316-317, 
322-323, 371 

Universes, actual, 289-290 
hypothetical, 289-290 
Unlikely hypotheses, 329, 360 ff., 397, 454 
Unlikely results in sampling, 340, 358-359, 
366 

Upper critical score, 469 
U. S. Census, 55, 66, 296 
U. S. Department of Treasury, 74 

Validity, functional, 465, 478 
of test, 314,464 ff. 
of test batteries, 480 
. of test items, 481 
operational, 465, 478 

Validity coefficient, effect of increase in 
variability, 480 
of a test, 263 
Validity criteria, 479 

Van Voorhis, W. R., 225 n., 247 n., 341 n., 
356 n., 383 n., 401 n., 482 n., 506 
Variability, differences, 414-419 
in man as sampling unit, 319 
in sample results, 358 
in terms of P.J5., 326, 393-397 
measures of, 140-141, 160 ff., 184 


of normal probability distribution, 341 ff. 
of sample results, 362-363 
of sampling distributions, 324, 354-356 
Variable, 7, 52 
definition of, 15-16 
Variable attributes, 99 ff., 283, 425 
Variable data, 1.3-14, 27-28 
Variance, 162, 244, 490 
of paired differences, 248; sums, 248 
Variates, 7,14 

Variation, Coefficient of Relative, 171-172, 
418-419 

Vigintiles, 128-130 
Vital statistics, 3-5 
Vocabulary test item, 268 
Volume factors, 498 
von Mises, R., 328 n. 

Walker, H. M., 3 n., 4 n., 428 n. 

Webb, J. N., 308 n. 

Wechsler, David, 415 n., 416 
Wechsler-Bellevue Scale, 415-419, 423, 479 
Wechsler Information test, 415-419 
Weight, 497 

Weight measurements of infants, 204 
Weighted mean of two or more groups com¬ 
bined, 417 

Weighting tests, 484-485 
Wundt, W., 316 

Yates, F., 295 n., 427 n., 506 
Yule, G. U., 15 n., 91, 92, 424 n., 506 

z function, 384, 386-387, 400, 420, 476, 518 
standard error of, 387 
2 score correlation chart, 213 ff. 

2 scores, 177 ff. 

and centile values, 178-182 
zero on psychological scales, 114 
Zubin, J., 321 n. 





